I have listed the questions for this inquiry before, but want to make sure they stay fresh in mind when I start a new topic related to them, so here they are, again.
1. What is measurement?
2. How is it done?
3. How did it develop, when and why?
4 Why is it so important: what is the function of measuring?
5. Is it being used appropriately?
I would like to post the elements again as well, so that they are easy to refer to.
- Measurement is a social activity, in that measurement is not usually for one person only
- Measurement is done to something (object, process, performance) in order to capture some characteristic of that something, for comparison, communication or replication
- There must be sufficient language to communicate the measurement of that something’s characteristic(s), not just the words but the concepts behind the words
- A way to record or capture the measurement of that something beyond language, such as writing, symbolic marks, numerical system, etc.
- Scales against which to compare the measurement, either previous measurements done in a similar fashion or perhaps, some standards
The next topic that I need to discuss is the fifth question: Is the measuring appropriate and are the results being used appropriately? Since this question has been raised without proving that it is an important one, I think I need to show that it is worth asking. I hope not only to show that it is, but also to begin to develop some criteria for determining whether any measurement is being used appropriately.
In one of the books that I read while preparing this, The Measure of All Things, Ken Alder, the author describes a fundamental tension in measuring – whether things, processes or performance, whether using standardized, uniform metrics or situationally derived metrics.
The book describes the epic efforts of the savants of France at the time following the French Revolution in 1789 to develop a standardized length of measurement – now known as the “meter” – based on nature, that could be used universally. Such an ambition, but it was in accord with the tenor of the times. And since the meter has been adopted nearly universally, despite set-backs, mistakes and refocusing, they were actually successful, but it is easier to look back and judge success than it was at the time of the efforts and for a number of years afterwards.
The question the savants tried to address was how to standardize measurements so that, for instance, a bushel of grain purchased in one place in France was easily recognized as the same amount elsewhere in France, and ultimately the world. (A little background and a look forward – there were few “standards” in Europe from the fall of the Roman Empire until after the Renaissance that were more than local in scope – this will be the subject of some future posts.) The drive to standardize was to facilitate, among other things, equitable dealing in commerce. However, once the savants reached agreement on the first standards, their French compatriots resisted converting the old localized measures to the new metric standards. The savants had the backing of the revolutionary government for a time and tried to force the conversion, unsuccessfully. Forcing and enforcing acceptance of any new standard of measurement is a difficult task, one that I see in my own business dealings, and will be discussed later. And there is good reason for the difficulty, for the tension:
[Marie-Jean-Antoine-Nicholas Caritat de] Condorcet, the dead optimist of liberation, had naively imagined a world in which universal law, derived from nature’s truth, could produce equality and freedom without contradiction. [Benjamin] Constant, the living pessimist of liberation, had witnessed how uniformity, enforced by mass mobilization, could suppress difference of thought and custom. Both men, of course, were right. Their aspirations and fears remain the two poles of the axis around which the modern world still revolves.1
What Constant witnessed, the suppression of difference among men, hints at the type of tyranny that standardized measurement, when actually applied to human behavior, can lead to.
My take is that having standardized measurements facilitates not just communication but the ability to get work done, whether it is building tangible objects, performing commercial transactions, or sending probes to Mars. A classic example of not using a single standard measurement showed up in the failure of the Mars Climate Orbiter, when the engineers used English units for the navigation commands, while the NASA scientists set the course in the metric system. What must have happened, based on the report describing what went wrong, is the computer underestimated the force of the thrusters: doing a burn based on an expectation of getting 1 newton of force, but getting 1 pound of force. 1 pound of force equals about 4.45 newtons. The result was that the Orbiter did its last burn, and since then, there has been no contact whatever. It may have crashed, burned up in the Martian atmosphere, or, as one note that I saw said, it may have flown past Mars and gone into an orbit around the sun.
More than just using the same standard can be involved, though. In France before their Revolution, taxation was locally done, and based not just on a standard tax for each amount of land owned, but also the quality of the land. Some land is more productive: richer soil, closer to watering sources, etc. When the post-Revolution surveyors came through to establish the actual length of the meter, they were misunderstood to be measuring the land so that it could be equally taxed based on area only. In this case, applying a single standard would have compounded the unfairness of the taxation. If standards are set up in the wrong way, while apparently using a methodology that mimics scientific methods, people and their lives will be unfairly judged.
To see some examples of this, I would like to bring up and discuss a truly important book by Stephen Jay Gould: The Mismeasure of Man. If you have read this book, I need say nothing more: you are already convinced that measurement can be used in ways that artificially limit the choices of humans, and can consign them to lives of subordination. If you have not, I strongly recommend it, since he will have said all of this in a more eloquent way than I can. I will try to provide some fresh comments about his book and about its subjects, however.
I used the version of Gould’s book that he revised and expanded in 1996, rather than the 1981 version. In the new introduction to it that Gould provided, he laid out the theme and purpose of the book in one succinct sentence:
The Mismeasure of Man treats one particular form of quantified claim about the ranking of human groups: the argument that intelligence can be meaningfully abstracted as a single number capable of ranking all people on a linear scale of intrinsic and unalterable mental worth.2
In the book he described a set of different ways to mismeasure intelligence within the context of the larger philosophical issue of biological determinism: the belief that what one is born with, his or her biological nature, determines entirely what one is capable of in life, and can be used by others to justify sociological situations. If a person’s skin is black (biology), then they are of lower intelligence (biology? environment?) and thus, deserve their poverty (sociology), runs the argument. Gould saw IQ, a single number “derived”, as its proponents describe it, or “assigned” as its detractors describe it, as being used to limit people by classifying them as stupid or smart, feeble-minded or normal. In his words:
This book, then, is about the abstraction of intelligence as a single entity, its location within the brain, its quantification as one number for each individual, and the use of these numbers to rank people in a single series of worthiness, invariably to find that oppressed and disadvantaged groups – races, classes. or sexes – are innately inferior and deserve their status.3
He explained how those who “evaluated” and ranked people on a scale from most valuable to somehow subhuman developed quasi-scientific measurements to justify their racism. The scales developed ran from Craniology (measure of skull shape and size) to IQ as a single number. Some believed that the races could be pyramidally arranged, with the white race at the top and Indians and Negros at the bottom, or that they were actually different species. As a result of their beliefs, they found apparent “scientific” justification, by biasing their reading of the data. The book is well done and thorough, describing how the data was cooked to fit their pre-conceived notions. It is, thus, a cautionary tale for anyone who wants to measure humans in any way, as opposed to just “men” – he describes his title as being part of his message, since those who did the measurement used white men (no women) as the standard against which everyone else was compared.
The elements of measurement that I have been using are present in the ways that the mismeasurement occurred. The social aspect and presence of language are easy to see. In each case, some form of concept was developed to be the framework in which whatever was going to be measured could be compared against the standards that that concept postulated. The principal postulate, then, of one of the proponents of the inequality of human races, Samuel George Morton, used as his operative concept, that larger skull capacity represented greater intelligence. Based on this, he collected skulls of Caucasians, Mongolians, Malayans, Egyptians, American Indians, and Negros.
First, Gould reviewed Morton’s method of measuring the capacities of the skulls, finding fault with his measurements, but more importantly, with his selection for sampling. For instance, all of his Caucasian measurements were made with male skulls, males being notable for their generally larger size, not just in skull size, than females, and a large percentage of his Negro skulls being those of females. He also rounded those races he disdained to lower numbers, and rounded those he favored to higher numbers, and finally, a no-no in statistical analysis, he selectively ignored some of his data, which, when analyzed by Gould, show that his results should have shown the capacities of all races to be different by insignificant amounts. Since Morton used his concept, derived numbers in a biased way, and then delivered the results of his measurements as if they were “scientific”, he provided support for those who argued that there were races of greater worth, and races of lesser, and well-deserved, lesser worth.
In this case, the concept was faulty and the numbers cooked to support the fore-gone conclusion. The consequences were drastic injustice. If we are to use measurement for making decisions about humans, whether behavior, performance, or capabilities, we must insure that the concepts are correct, and that the data is carefully examined, understood and used correctly.
Gould’s book continues to explore the data of a number of studies that support inequality by trying to explain variation in brain size, weight, cranial capacity and any other types of variation (such as “the ratio of radius (lower arm bone) to humerus (upper arm bone)”4 in ways that support their preconceived worthiness of the races. Invariably, he found that the data has been cooked to provide the support the pre-conceptions of the proponents. This is not an easy book, but it is important.
Knowing of Gould’s leftist proclivities, one must always show some caution accepting what he said as well, since his pre-conceived notions could have, but may not have, affected his conclusions and his reading of the data in much the same way as he accuses the proponents of racial inequality. In the case of this book, though, I find it difficult to fault his reviews of the actual data assembled by the apologists for inequality.
Giving the matter some additional thought, I should note that one of the reasons for re-issuing the book was that in 1994, a book called The Bell Curve by Richard J. Herrnstein and Charles Murray, was published, a book that I’ve not read. In his revised and expanded edition, Gould included a critique of The Bell Curve as an appendix, describing it as effectively being answered by his arguments in his earlier version, but then he goes through the way that he finds that Herrnstein and Murray have reused the techniques of the earlier inequality advocates. The rethinking that I’ve done is that to characterize Gould’s book as simply an exploration of the use of scientific-like practices to advance a pernicious doctrine is incomplete: his book is a book of advocacy.
The method that is described – using a pre-conceived idea, and then finding data to back it up – sounds like the inverse of the scientific method that I was originally taught. What I had learned was that one measures, then uses the data to draw conclusions. I have learned, after reading Thomas Kuhn’s book, The Structure of Scientific Revolutions, that most of scientific work starts with a working hypothesis, measures or otherwise collects data, and then draws its conclusions, which may or may not support the working hypothesis. I plan to discuss this much more fully in a subsequent post, but to use this simple description here will allow me to highlight the contrast between real scientific methodology, which follows the data – the measurements – wherever they may lead, and the pseudo-science of cooking the data to support the working hypothesis.
So that you know where my bias is, I probably fall on the same side as Gould’s politics, although my leftish leanings may not be as strong as his. I do firmly believe that there are inequalities in the way that people are treated and treat each other; objectivity is a nice goal but generally illusory; and that there is a right and a wrong way to treat data. The inequalities that I see are not necessarily a result of genetic capabilities nor are they immutable; rather, I believe that the greatest part of an individual’s character, behavior and abilities are a result of post-birth influences, which is not solely a “nurturist” position. I have learned enough of the biochemistry of growth, and the way that genetic drivers invoke timed releases of enzymes and proteins to foster growth not only before birth but afterwards as well, to believe that there is a continual interplay of the “nature” and “nurture” influences in human development.
Be that as it may, I started this post by stating that I wanted to convince you that there are times when measurement is misused to the detriment of people, and that perhaps there are some criteria that can help determine when measurement is misused. What I’ve shown so far are a couple of cautions about misusing “scientific data”. Are these the only places where misuse could occur? Of course not.
Posted on a blog on the New York Times is an article discussing performance reviews in the workplace. Tara Parker-Pope mentions a book by Samuel A. Culbert called “Get Rid of the Performance Review!”: “Annual reviews not only create a high level of stress for workers, he argues, but end up making everybody – bosses and subordinates – less effective at their jobs.”5 I have not read his book, just this blog entry, but evidently he has accumulated a great deal of anecdotal information about the deleterious effects on individuals of performance reviews, in addition to other, more objective types of information regarding these effects. The blog made sense to me: having used one company’s form of reviews as both the managed and the manager, I could only see a value to them as a check box item – yes, I’ve done it, but I don’t see the value other than to assert managerial authority.
The concern is that while giving the appearance of objectivity, the evaluation numbers are very subjective. The same can be said of many of the “customer satisfaction” surveys that I’ve been asked to take as a call-in customer to a call center.
The surveys I object to are personal reviews of agents’ performance, and do not measure satisfaction. After working my way through a poorly designed IVR script, finding myself navigating incorrectly, finally figuring out how to just speak to a live agent and asking to do so, usually when I reach the agent, in general, he or she tries diligently to do what I requested be done or answers the question I asked. Then, up comes the survey. The survey then asks me how the agent has done in performing their work, and whether or not they have been cheerful. Nowhere does it let me express my complete frustration and dissatisfaction with the company and its awful IVR script! Now, not only have I been frustrated with dealing with the company, except for the interaction with the agent, but I’ve been frustrated in expressing the frustration.
This is what I’ve told my customers is an example of the satisfaction survey written from the point of view of the company, not from the point of view of the customer. Imagine having the company ask you questions about your relationship with them that you want to answer, not asking questions that they can use to rate themselves better for a presentation to the executive in charge of giving them a raise (potentially), who will only see your data points as they contribute to a graph rolling up everyone else’s data points. I rarely agree to take companies’ customer satisfaction surveys, and if I do agree, I don’t complete them when they turn out not to ask questions I want to answer.
In all of these situations, the method used is to mimic objective, scientific measurement methods. However, not only is this faux-scientific method not objective, but based on the results, is usually self-serving, as in the instance of those who justify racism by using concepts for the evaluations that are faulty and biasing the results, or those who are convinced that the numbers on their customer satisfaction surveys mean that customers are, in general, quite satisfied with their service.
There are other instances of this, but I will hold off on analyzing them for the time being, and will get to them at some point in the future. For now, it is enough to know that there is a potential dark side to measurement.
Special thanks to Stephen Darlington for helping me figure out the name of the space shot that failed. While not directly concerned with measurement, his blog is at
and has a number of categories with great information.
1 Ken Alder, The Measure of All Things, Free Press, a division of Simon & Schuster, Inc., 2002 New York, NY p. 317
2 Gould, Stephen Jay, The Mismeasure of Man, revised and expanded, W.W. Norton & Company, 1996, 1981 New York, NY p. 20.
3 Gould, Stephen Jay, The Mismeasure of Man, revised and expanded, W.W. Norton & Company, 1996, 1981 New York, NY pp. 56-57.
4 Gould, Stephen Jay, The Mismeasure of Man, revised and expanded, W.W. Norton & Company, 1996, 1981 New York, NY p. 118.
5 Parker-Pope, Tara, http://well.blogs.nytimes.com/2010/05/17/time-to-review-workplace-reviews/