2010.01.12
Model, Data, Software
Some time ago, a commenter asked my opinion (as a programmer) on the evolving mess that is known as ClimateGate.
My first thought was that the problem was not primarily an issue with bad programming (though there are questions about that), but an issue with the Data and the Model.
[NOTE: many of my links are to other bloggers who were skeptical about the causes and consequences of planet-wide rise in temperature before the Climategate story was broken. The discussion below should lead to its conclusion without any assumptions about what is causing the globe to warm, how much warming has occurred, and whether it can be expected to continue into the foreseeable future.]
To begin with, computer modeling is not computer programming.
Computer modeling begins with a model that can be written on paper. Technically, computer modeling is an acceleration of doing the same thing on paper. A successful computer model takes a problem that can be solved on paper over a period of years, and turns it into a problem that can be solved on a computer in a matter of hours.
Like a model done on paper, the output of a computer model is limited by two things: the quality of the model, and the data it uses.
As an example, an astronomer and mathematician named Johannes Kepler spent many years of his life attempting to model the orbit of Mars. His work would have been shortened drastically if he'd had a slide-rule and a copy of Napier's work on logarithms...and it would have been even shorter if he'd had a computer and a programming assistant.
However, the Computer part of Computer Modeling is here seen as second to the model. Before Kepler could do any math (whether on paper or with bits and bytes), he needed a model to work on.
Similarly for the folks at the Climate Research Unit: before they did the computer work, they created a model. Most popular news articles covered the relationship between carbon dioxide and modeled global climate. Hopefully, the model contained solar energy input, cloud cover, ice caps, heat energy absorbed and released by the planet's oceans, and dozens of other inputs that my non-climate-scientist mind is unaware of.
However, even the best model in the world can't be used to get good predictions if the data it uses is bad.
When Kepler modeled the orbit of Mars, he used data gathered by fellow astronomer Tycho Brahe. The error between observations of Mars and the predictions given by Kepler's circular-orbit-model varied between 2 and 8 minutes of arc. (Since a minute of arc is 1/60th of a degree, this amounts to 0.0333 to 0.1333 degrees of error.)
If Brahe's observations of the position of Mars in the sky weren't accurate to 0.02 degrees or less, then his data could not have been used to show that Kepler's first model was wrong. While the computation done by Kepler could have been done to a high level of precision, the precision of the original measurement would have limited the precision of the model's result.
The folks at the CRU had a much more tangled data-set than Kepler did. There is even some question as to whether they still have the original input data, or only kept track of adjusted data. Some data stations may have moved during the measurement, and there isn't an international standard about the siting of the stations. In the United States, it is an open question whether the weather-data stations even meet the published standards of the agency which runs them.
Whatever the quality of the model, the results are rendered useless if the input data is garbage.
So, the questions about what the ClimateGate leak reveals are not questions about programming. They are questions about the Model and the Data.
My first thought was that the problem was not primarily an issue with bad programming (though there are questions about that), but an issue with the Data and the Model.
[NOTE: many of my links are to other bloggers who were skeptical about the causes and consequences of planet-wide rise in temperature before the Climategate story was broken. The discussion below should lead to its conclusion without any assumptions about what is causing the globe to warm, how much warming has occurred, and whether it can be expected to continue into the foreseeable future.]
To begin with, computer modeling is not computer programming.
Computer modeling begins with a model that can be written on paper. Technically, computer modeling is an acceleration of doing the same thing on paper. A successful computer model takes a problem that can be solved on paper over a period of years, and turns it into a problem that can be solved on a computer in a matter of hours.
Like a model done on paper, the output of a computer model is limited by two things: the quality of the model, and the data it uses.
As an example, an astronomer and mathematician named Johannes Kepler spent many years of his life attempting to model the orbit of Mars. His work would have been shortened drastically if he'd had a slide-rule and a copy of Napier's work on logarithms...and it would have been even shorter if he'd had a computer and a programming assistant.
However, the Computer part of Computer Modeling is here seen as second to the model. Before Kepler could do any math (whether on paper or with bits and bytes), he needed a model to work on.
Similarly for the folks at the Climate Research Unit: before they did the computer work, they created a model. Most popular news articles covered the relationship between carbon dioxide and modeled global climate. Hopefully, the model contained solar energy input, cloud cover, ice caps, heat energy absorbed and released by the planet's oceans, and dozens of other inputs that my non-climate-scientist mind is unaware of.
However, even the best model in the world can't be used to get good predictions if the data it uses is bad.
When Kepler modeled the orbit of Mars, he used data gathered by fellow astronomer Tycho Brahe. The error between observations of Mars and the predictions given by Kepler's circular-orbit-model varied between 2 and 8 minutes of arc. (Since a minute of arc is 1/60th of a degree, this amounts to 0.0333 to 0.1333 degrees of error.)
If Brahe's observations of the position of Mars in the sky weren't accurate to 0.02 degrees or less, then his data could not have been used to show that Kepler's first model was wrong. While the computation done by Kepler could have been done to a high level of precision, the precision of the original measurement would have limited the precision of the model's result.
The folks at the CRU had a much more tangled data-set than Kepler did. There is even some question as to whether they still have the original input data, or only kept track of adjusted data. Some data stations may have moved during the measurement, and there isn't an international standard about the siting of the stations. In the United States, it is an open question whether the weather-data stations even meet the published standards of the agency which runs them.
Whatever the quality of the model, the results are rendered useless if the input data is garbage.
So, the questions about what the ClimateGate leak reveals are not questions about programming. They are questions about the Model and the Data.
Posted by: karrde at
21:41
| No Comments
| Add Comment
Post contains 573 words, total size 5 kb.
15kb generated in 0.025 seconds; 44 queries returned 104 records.
Powered by Minx 1.1.2-pink.
Powered by Minx 1.1.2-pink.








