In a post in November, 2010, called “2 Geodetic Surveys”, I described a geodetic survey done in Peru in the 18th century by a group of French scientists. One technique that was used, and variations of it are always used in well done surveys, was to have two teams perform the same measurements and calculations and compare the results. The French were trying to measure the length of a degree of arc near the equator, and, using a French length unit, the toise which is 6 Paris feet and 6.39 English feet, one team got a result for a degree of longitude of 56,749 toises and the other team’s result was 56,768 toises. This is a difference of 19 toises, or approximately 0.0335%. That’s about 121 (English) feet difference over about 68.73 miles, which is about 362,894 feet.1 Those results are in incredible agreement, especially if you factor in the difficulties that they had obtaining any results, let alone these results.
The basis of information for the other survey discussed in that same post is Ken Alder’s book, The Measure of All Things. This other survey was the measure of longitude with which to establish the length of a meter, which was performed subsequent to the French Revolution by Pierre Mechain (1744-1804) and Jean-Baptiste-Joseph Delambre (1749-1822). They too were successful, ultimately, but not without difficulty. In “Chapter 11 – Mechain’s Mistake, Delambre’s Peace”, Alder describes a strange condition that had developed over the course of the survey. Mechain, as described by Alder, was a meticulous astronomer, given to depression, doubt and obsessive attention to detail. His own measurements did not agree exactly nor meet his exacting standards, though they are, in retrospect, some of the best that had been done to that date. He ended up fudging results, changing figures to make himself look better, but in essence, trying to cover up “mistakes” that were, to him, intolerable. When Delambre received Mechain’s raw data, after Mechain died of yellow fever while trying to correct the observations, the data was not in clear, bound notebooks but on scraps of paper with erasures, lack of dates, etc.. Delambre carried forward his colleague’s cover-up, cleaning it up so that it was presentable enough, because he found that the ultimate results of Mechain’s were “correct”, erring mostly in the size of variations among his observations. All the erasures and corrections did not affect the result, but made Mechain’s work look as if it had been performed better than it actually was. Various instrumental problems, such as wear and lack of calibration, have been blamed, but midway through the discussion, Alder makes a statement that opened up a whole different path for discussing measurement.
In the end, Mechain came to blame himself – to his eternal shame and torment. …There is, however, one other possibility. What if nothing and no one was to blame? Indeed, what if there was no meaningful discrepancy at all? That is: what if the error lay neither in nature nor in Mechain’s manner of observation, but in the way he understood error? Twenty-five years after Mechain’s death, a young astronomer named Jean-Nicolas Nicollet [1786-1843] showed how this might be the case…
Mechain and his contemporaries did not make a principled distinction between precision (the internal consistency of results) and accuracy (the degree to which those results approached the “right answer”).2
This raises several questions, and as usual with questions, some of the answers lead to even more questions. Let’s start with “the way he understood error”. What can that possibly mean? I was taught, in math anyway, there is a right answer and there are wrong answers. I have heard about degrees of wrong, as in “a little wrong”, or “flat-out wrong”, but I can’t remember degrees of right.
Actually, the definitions for precision and accuracy stated in the quote get us pointed in the right direction. If we think about the first mentioned survey, where there was a difference of 19 toises between two results, this is a situation that occurs regularly in all measurement: the first time you measure something may not agree with the second or other subsequent measurements. What is causing the problem? Is it the temperature, it was warmer on day two than day one, so maybe the metal tape measure or ruler had “expanded” a little with the heat. That would result in what was measured registering as a little smaller on day two. Maybe the tape measure or ruler was sufficiently constant, but since you were measuring a piece of wood and it rained during the night, the damp caused it to swell. Maybe on day one you did the measurement, and on day two you had your friend do it. There can be individual human variations in the way that measurements are captured. So how can anything be measured accurately and with precision?
The method that has been developed in the last 200 or so years is called statistical analysis, and includes techniques and mathematical concepts for dealing with multiple “measurements”, also known as data points. (I can hear groans, and yes, I feel the same way about statistics – if ever there was a boring subject…). If you are going to talk about measurement, though, statistics cannot be ignored.
One important concept in statistics is the mean of a data set: commonly known as an average. For our Peruvian measurements, 56,749 and 56,768 toises, we add them together and divide by two: 56,758.5 toises. But we can provide a little more information by indicating the range of the two measurements if we add ± 9.5 toises. The “±” means plus or minus, so if you add 9.5 to 56,758.5, you get 56,768, and subtracting 9.5 from our average or mean gives 56,749. So, a common way to express a result of measurements is 56,758.5±9.5 toises.
This would be impractical, say, for a cabinet maker who wants the edges of the drawers to line up correctly: cut a piece that is 14.5±0.25 inches. The phrase drilled into apprentice cabinet makers, carpenters, etc., is “measure twice, cut once”. But they must use a number that does not have the “±” because the average is not going to give them a clean finished product. But “±” is useful for certain types of measurement.
I started the questions with Mechain’s understanding of error. Being obsessively concerned with details and having built his reputation on the accuracy of his observations, when he found that the variation in his multiple observations was too high, evidently he thought this reflected on his ability to provide accurate observations. What Nicollet noticed, however, was that over time, Mechain’s “errors” tended in a similar direction, so he suggested that there had been wear in Mechain’s instrument that made “level” actually be tilted in the same direction every time the instrument was set up, and got worse over the seven years of observing.
The trick was to compensate for any change in the instrument’s verticality by balancing the data for stars which passed north of the zenith (the highest point of the midnight sky) against those which passed south of it. Because Mechain had measured so many extra stars, such an operation was possible.3
What Nicollet did, then, was to average the north-passing stars, then average the south-passing stars, and then average the two averages. He discovered that the most egregious error, which looked as if there was a difference in the readings of nearly 400 feet was actually accurate to within 40 feet.
By the time Nicollet started working on Mechain’s data, some new techniques had been developed by several mathematicians, but since developing new techniques are rarely without controversy, these were too. Adrien-Marie Legendre (1752-1833) developed a method called “…the least squares method, which has broad application in linear regression, signal processing, statistics and curve fitting.”4 In 1805, he published his discovery. The Prince of Mathematicians, Carl Friedrich Gauss (1777-1855), noted for not publishing nearly as much as he should have, announced that he had been using the method since 1795, and pretty much proved his case, by having predicted a position for Ceres in 1801.
Ceres, a dwarf planet/asteroid in the asteroid belt between Mars and Jupiter, had been observed early in 1801 for a short time by Giuseppe Piazzi (1746-1826) and the observed positions were published in September 1801. By then, Ceres had disappeared into the glare of the sun. Gauss used the observations to calculate and then announce where it would re-appear and when. He was correct to within half a degree, and Ceres was found again. Among the calculations he used was the least squares method, a method that is used to match actual observations with a model developed from the actual observations, and assumes that the experimental or observational errors have normal distributions. A “normal” distribution is one name for the famous “bell curve”, another name for it is a Gaussian distribution.
Regardless of who was first, this was among the techniques that have been used to apply statistical analysis to all sorts of problems. Are statistics a form of measurement? Well, not really, but….
In order to deal with the unavoidable variance of measurements in a number of fields, “errors” that appear to arise just from the process of measuring, a method had to be developed. There were a number of contributors to the method, statistical analysis, but now the method has “hardened” into a set of standard mathematical definitions and procedures. If one uses them correctly, others may dispute some parts of the questions being addressed, but not how the conclusions were reached. A pretty bold statement for a field defined by “…lies, damned lies, and statistics.” There are legitimate ways to dispute meanings drawn from an experiment, but as long as there are no mistakes in the math, the statistical procedures don’t suffer from being questionable. If one starts from assumption A, proceeds to gather data about this assumption, develops a hypothesis, then works the statistical procedures on the data and announces the result, the questions are not about the statistical procedures but about the assumption, the data, the hypothesis and the inferences. And the dispute can be to the application of the statistical procedures to the data, but not about the statistical procedures themselves. It is possible to “lie” with statistics by miss-applying the procedures to a data set, but, as pointed out by Dr. Michael Starbird in a series of lectures I have been watching, it is easier to lie without statistics.
Dr. Starbird’s lecture series, called Meaning from Data: Statistics Made Clear, is available through The Great Courses, and can be ordered through:
I am enjoying the lectures: he is clearly knowledgeable and presents well. Plus, he was recommended to me by one of my daughters, who was the development editor for a wonderful textbook that he and another professor, Edward B. Burger, wrote, called The Heart of Mathematics, An invitation to effective thinking. I’ve read large portions of that text and enjoyed his and Dr. Burger’s way of presenting “complicated” mathematical concepts.
To put this back together, though, it is in the 6th lecture in Meaning from Data that Dr. Starbird discusses the Bell Curve, also known as the normal curve or the Gaussian curve. He attributes the first working out of the curve to the work Gauss did to locate Ceres. Did Gauss measure where Ceres was? No, that was done by the astronomer, Giuseppe Piazzi, who recorded 24 observations during January and February of 1801. Gauss took the observations, and used the statistics to define the path and the rate of travel on the path, to build a “model” of how Ceres would behave. Nothing proves the value of new mathematical models like successful predictions, and Gauss was just about dead on.
Subsequent to this, Nicollet used the same kinds of statistical methods to deal with the “errors’ in Mechain’s observations, and was able to prove that Mechain’s errors were not errors, but variations within the tolerable for a precise and accurate understanding of the material needed to establish the correct length of the meter.
To now answer the question, ‘is statistics measurement?” the full answer is no, but it is a set of mathematical definitions and procedures, concepts if you will, that allow building of models from measurements.
While I had wanted to put more into this post, this has been a difficult one to write, and probably to read: no pictures. The other information will wait for the next post, when I will have to spend some time on the only other subject in math that invariably (or probably?) makes people nod off to sleep: probability. I will close by repeating my favorite question about the limit of using the concept of averages, originally posed to me by a friend and workmate, Bruce Poole.
If you had your right foot in a bucket of boiling water, and your left foot in a bucket of ice water, on average, would you be comfortable?
To remedy the no-pictures problem, below are two bell curves: normal distributions: Gaussian distributions. The first is just the curve5, the second6 shows 3 curves, one with a standard deviation of 1, a second with standard deviation of 2, and the third with a standard deviation of 3. We will talk more about this in the next post as well.
1 Whitaker, Robert, The Map Maker’s Wife, Delta Trade Paperbacks, 2004, New York, N.Y. (toises are discussed on p.48, the degree results are on pp 166-167)
2 Alder, Ken, The Measure of All Things, Free Press, a division of Simon & Schuster, Inc., 2002 New York, NY pp 298-299. I added Nicollet’s dates to the quote.
3 Alder, Ken, The Measure of All Things, Free Press, a division of Simon & Schuster, Inc., 2002 New York, NY p.300.
5 Kachigan, Sam Kash, Multivariate Statistical Analysis, A Conceptual Introduction, Second Edition, Radius Press, New York, N.Y., 1991. p.31.
6 Hughes, Ifan G. & Hase, Thomas P.A., Measurements and their Uncertainties, A Practical Guide to Modern Error Analysis, Oxford University Press, Oxford, England, 2010. p. 13.