Data availability in climate science: the case of Jones et al 1990 and Nature
(This article is available as a pdf file.)
Here are a few quick questions:
- Are a scientific journal’s instructions contained in its manuscript preparation and formatting guide, its policy on data availability?
- Are instructions for expeditious processing of a paper by a journal, conditions for publication?
- Can the author of a scientific paper say: “when I published my paper, the journal did not require data to be submitted as a pre-condition of publication so I don’t have to give it to you now”?
- Is raising questions about a journal requiring an author to provide raw data, demeaning to the author?
- Did journals at a time when not bound by formal policies, not subscribe to commonly accepted principles of scientific publication?
If you answer ‘yes’ to all these questions, you would be in the select company of a few who shield authors from making their data available, even much against the scientists’ own judgments.
The issues swirling around the Jones et al 1990 paper in Nature on the urban heat island effect “strange” and the circumstances so complex—if one believes Fred Pearce at the Guardian for example, it is “difficult to imagine a more bizarre academic dispute” . The paper continues to be cited as a primary reference in its field, for e.g., it was so in the fourth assessment report of the IPCC, of which Jones is listed as an author. What has gotten the ball rolling this time — was a response  to a post at the Bishop Hill blog  questioning a Nature magazine exhortation directed at critics of the CRU  a propos access to scientific data.
Our focus is therefore narrow, and the case in front of us can be stated thus: Jones published his paper in Nature, many years ago in 1990. The only condition Nature had for papers in 1990 was to make ‘sequence and x-ray crystallography data’ available. Since Jones’ paper had temperature and related data, he did not fall under this condition. Therefore Jones has no ‘obligation’ to provide this data at the journal’s request. Nature required all authors to make data available as a condition for publication only 1997 onwards. So it could not require Jones to make data available on request. Moreover, since this was all eons ago in the digital stone age, hardly anyone can expect Phil Jones to share his data when computer storage was so expensive and occupied so much physical space. Jones does not owe anyone any explanation about questions that have arisen about his 1990 urban heat island paper, and nor should the journal be drawn in.
The problem is— almost everything with the above is wrong.
It is perhaps a good idea to clear the air on the facts first, and as we shall see, this has a bearing on the way this argument has been put forth.
Nature’s request to ‘deposit sequence and x-ray crystallography data in databases’ was not present in the journal in 1990, even in October – it did so only later. This appeared in the journal’s ‘guide to authors’ for preparing manuscripts. The guide dealt with how to typeset the manuscript, where to send it, what to put on the envelope label, how to do figures etc. The request (although it did not even exist in 1990), was not a condition for publication. It was not part of a data policy. Nature had no formal editorial policy or a data policy at that time. Nor did the journal set any conditions for publication.
What was Nature’s position?
What was Nature’s editorial position about issues of replication of research findings, scientific fraud and data availability in the early 90’s? Did the absence of a formal policy mean a total absence of norms?
Quite to the contrary – for Nature’s stalwart editor John Maddox, science was a gentlemen’s affair, to be played by principles which were widely understood and accepted. Maddox believed journals had only a small role in policing authors or keeping their conscience. Changes were afoot in the very period Jones’ paper was published. What is important and key, to the present debate has to be noted. Although Maddox’s resistance to formal data requirements for publication was grounded in a context entirely different from the spectre being raised presently, his concerns about data access presaged current views accurately.
Protein crystallography and genomic research had reached a watershed moment around 1990 . The debate, common in both disciplines was – how, why and when to share data and about who would regulate and co-ordinate it, e.g., see . Issues of data ownership and access were beginning to bother other scientific communities. There was increasing recognition that erroneous or bad data once published, could set researchers down wrong paths for years. Thus, instructive similarities to the present debate result.
In 1989, the International Crystallographers Union (IUC) wanted journals to require authors to submit primary data to facilitate verification . DNA researchers were on a similar path. Cold Spring Harbor Lab wanted Nature to require submission as condition; authors from the Brookhaven protein data bank wanted the same . For instance, in a letter to Nature Richard Roberts of the CSHL wrote he was “appalled” by Maddox’s comments and his reasons for not setting conditions “a potpourri of excuses for inaction”. Researchers wanted sequences in databases before papers were published – they would be in computer-form to begin with, he pointed out.
What prompted the strong reaction? In an editorial, Maddox had spelt out earlier: databases were proliferating, poor countries like India may not be able to comply with such conditions, and commercial outlets may not be able to submit. But his main reason for not setting conditions was once again clear – journals ought not to play “policemen” or “adjudicate on authors’ subsequent conduct” . Grant-making agencies were better placed for this purpose. The first duty of journals was, he wrote, “to speed up the process of publication”, but also “ensure the integrity of what they publish (as far as possible)”, which gave them a “right to ask for supporting data (sequences, atomic co-ordinates)”. Of the latter he wrote, journals should “do that more often” and “arrange to provide access to them”. In the general sense, Maddox had begun stating “data arising in the course of discovery should be generally available” and access was “especially important when a claim can be checked only by reference to the original data” .
In October 1990, reporting his impressions of a meeting at Bethesda with the Vancouver group, Maddox noted a “general applause for data-sharing” with acknowledgement that it was “rarely as simple” as that. Medical research journals had just begun accepting the Vancouver requirement for sharing research materials as a pre-condition for publication . Maddox appeared to shake his head, but he saw the trend.
In the end Maddox relented, if only slightly. By January 1991 the passage about nucleotide and protein data had found its way into Nature’s guide for authors. Maddox however held on to his promise. Just as he had said earlier that Nature would “urge on its contributors the importance of submitting their data to the databanks” but not “exact unenforceable promises” , it was set out as a request to authors and not a condition for publication. These were the specific circumstances surrounding the request for sequence and protein data.
Why did he resist pre-publication conditions? Maddox’s reasons remained the same – “rules”, he wrote earlier, only ran the risk of the “recurrent need to break them” and “burying communication between astonishingly creative people”; the journal would set ‘as few obstacles as possible between an author and publication’. The first task was to get findings out. How would the journal deal with misconduct? Nature, Maddox declared, would “rather rely on exhortation and the occasional admonitory illustrative example” and “look into suspicious circumstances arising from its own postbag” .
All through this debate, it must be pointed out and quite explicitly so— the basic matter of replication of scientific findings were never under question with John Maddox or, Nature. That authors or advocates would seek to deny permission to those who sought to replicate findings employing a journal’s rules, was never the issue. Even as he pondered how much information a paper ought to make available for others, Maddox was clear :
When a fellow scholar, after a few careful readings of the text, is persuaded that he can so fully understand what has been done, and why, that he could march to his own laboratory and repeat the observations he has seen described with a fair chance of replicating their authors’ results successfully. Naturally, this does not often happen. The more complete a paper, the less likely are its readers likely to strive at its replication. But the principle supervenes: to be credible, a paper must contain the essence of its replication. On that, everybody agrees.
It is therefore interesting to note how things have turned full circle – an apparent absence of rules is now sought to be employed to bury communication between people.
Who agrees with such narrow interpretation of data?
While it is important to illuminate historical events what concerns us, is how this problem is seen today with regards to Nature and the Jones et al 1990 paper. On that count: who agrees with the interpretation we encountered earlier?
Not Phil Jones, the author of the paper in question. Jones had been confronted by requests for the same data before . Although he exhibited some antediluvian tendencies towards scientific data, Jones never once gave the excuse that his paper in Nature was covered under a data policy that exempted him from sharing it. Nor did he suggest that since Nature had no policy at the time of publication he was not obliged.
Phil Jones’ higher-ups at the University of East Anglia (UEA) did not. In responding to a FOI request from Steve McIntyre, the Climatic Research Unit (CRU) of the university eventually released the Chinese temperature data to McIntyre and the public . Mercifully enough, the data was in digital binary format and not in paper tape.
Finally, Nature magazine themselves do not. In 2008, Nature was queried about a materials complaint regarding this same Jones et al 1990 paper. The response indicates Nature would apply the current editorial position on the journal’s prior papers, including those from 1990. As long as the paper was not superseded by later work and remained scientifically relevant, the journal would get involved. This is completely contrary to expectations from tunnel vision, that papers published in a journal under a presumed data policy are bound only by that policy. Nature did not exempt itself pointing out any official reasons. So the very journal whose ‘guide to authors’ was mistaken to be its policy does not subscribe to such interpretation.
In the end the question remains to be asked: why believe that only sequence and crystallography data were items Nature mandated authors make available? Why else, but to exempt Jones from imaginary conditions – seeing as how narrow and well-suited for this purpose such a misreading and interpretation of Nature’s guidelines turns out to be. What is more, there is continued belief that data on the Chinese network of stations belongs to Jones, which he can choose at his personal discretion, unburdened by any scientific principles, whether to hand over such data at all, and whom to give it to. It is likely there are many in the climate establishment who feel this way. This should change – the three inquiries that looked into the science of the CRU were unanimous.
Will further narrow and petty defenses be evoked? These are purposefully thrown into the debate to fracture and widen the chasm between amateur scientists, skeptical observers and climate scientists. We need to provide all assistance possible to climate scientists in thwarting such efforts.
 F. Pearce. (2010). Changing weather posts in China led to accusations of scientific fraud.
 EliRabett. (2010). The Wayforward Machine.
 A. Montford. (2010). Nature Editorial on Climategate.
 “Closing the Climategate,” Nature, vol. 468, pp. 345-345, 2010.
 L. Roberts, “Controversial From the Start,” Science, vol. 291, pp. 1182-1188, February 16, 2001 2001.
 G. C. Anderson, “But what is the problem?,” Nature, vol. 345, pp. 8-8, 1990.
 C. McGourty, “Who’s hiding primary data?,” Nature, vol. 341, pp. 94-94, 1989.
 R. J. Roberts, “Benefits of databases,” Nature, vol. 342, pp. 114-114, 1989.
 J. Maddox, “Making good databanks better,” Nature, vol. 341, pp. 277-277, 1989.
 J. Maddox, “Should camp-followers be policemen?,” Nature, vol. 348, pp. 107-107, 1990.
 S. McIntyre. (2007). East Anglia Refusal Letter.
 P. D. Jones. (2007). Data used in the Jones et al. (1990) publication.