- News Features
The 100 most-cited scientific papers
- 30 Oct 2014
- By David Shultz
Here at Science we love ranking things , so we were thrilled with this list of the top 100 most-cited scientific papers , courtesy of Nature . Surprisingly absent are many of the landmark discoveries you might expect, such as the discovery of DNA's double helix structure. Instead, most of these influential manuscripts are slightly more utilitarian in nature. For example, item No. 1, with more than 300,000 citations: Protein measurement with the folin phenol reagent. Perhaps importance isn't always sexy.
More from news
- Andrew Curry
- Rodrigo Pérez Ortega
- Science News Staff
SIGN UP FOR NEWS FROM SCIENCE DAILY HEADLINES
Support nonprofit science journalism.
Help News from Science publish trustworthy, high-impact stories about research and the people who shape it. Please make a tax-deductible gift today.
If we've learned anything from the COVID-19 pandemic, it's that we cannot wait for a crisis to respond. Science and AAAS are working tirelessly to provide credible, evidence-based information on the latest scientific research and policy, with extensive free coverage of the pandemic. Your tax-deductible contribution plays a critical role in sustaining this effort.
- The A.V. Club
- The Takeout
- The Inventory
What Are The Most Cited Research Papers Of All Time?
The writers at Nature News recently put together a list of the 100 most highly cited papers of all time. There are a few surprises in here, including the fact that it takes no fewer than 12,119 citations to rank in the top 100.
The list, which was created by pulling data from the Science Citation Index (SCI), spans the last 100 years of scholarly publications. The sheer size of the literature — 58 million items — shows that the top 100 papers are true outliers; just three publications have more than 100,000 citations. Many of the world's most famous and influential papers didn't even make the cut.
To help put it all into perspective, Nature News put together a video (above) and this infographic (click to embiggen).
If the cumulative stack of all these papers were scaled to the size of Mount Kilimanjaro, then the 100 most-cited papers would represent just one centimeter at the peak. Only 14,499 papers — about a meter and a half's worth — have more than 1,000 citations. Roughly half of the items have only one citation.
Comes with twelve different courses comprised of a huge number of lessons, and each one will help you learn more about Python itself, and can be accessed when you want and as often as you want forever, making it ideal for learning a new skill.
So what's the most cited paper of all time? That distinction goes to a 1951 paper by U.S. biochemist Oliver Lowry and colleagues describing an assay to determine the amount of protein in a solution. To date, it has collect more than 305,000 citations. And no one's entirely sure why...
Here are the top five:
Shockingly, Watson and Crick's paper on the structure of DNA missed out (just 5,207 citations... what!? ), along with many other historic breakthroughs (like the 1985 discovery of the ozone hole — just 1,871 citations). Instead, papers on methods and software dominate the list.
Here's a breakdown of the citations by category:
You can browse through the entire list here (.xls spreadsheet) or via Nature News 's interactive graphic . And there's much more at the Nature News article .
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- 12 September 2018
Thousands of scientists publish a paper every five days
- John P. A. Ioannidis 0 ,
- Richard Klavans 1 &
- Kevin W. Boyack 2
John P. A. Ioannidis is a professor of medicine at the Meta-Research Innovation Center at Stanford (METRICS), Stanford University, California.
You can also search for this author in PubMed Google Scholar
Richard Klavans is a researcher at SciTech Strategies in Philadelphia, Pennsylvania, and New Mexico.
Kevin W. Boyack is a researcher at SciTech Strategies in Philadelphia, Pennsylvania, and New Mexico.
Illustration by David Parkins
Authorship is the coin of scholarship — and some researchers are minting a lot. We searched Scopus for authors who had published more than 72 papers (the equivalent of one paper every 5 days) in any one calendar year between 2000 and 2016, a figure that many would consider implausibly prolific 1 . We found more than 9,000 individuals, and made every effort to count only ‘full papers’ — articles, conference papers, substantive comments and reviews — not editorials, letters to the editor and the like. We hoped that this could be a useful exercise in understanding what scientific authorship means.
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Nature 561 , 167-169 (2018)
Wager, E., Singhvi, S. & Kleinert, S. PeerJ 3 , e1154 (2015).
Article PubMed Google Scholar
Quan, W., Chen, B. & Shu, F. Preprint at https://arxiv.org/abs/1707.01162 (2017).
Hvistendahl, M. Science 342 , 1035–1039 (2013).
Nature 483 , 246 (2012).
Abritis, A., McCook, A. & Retraction Watch. Science 357 , 541 (2017).
Patience, G. S., Galli, F., Patience, P. A. & Boffito, D. C. Preprint at https://doi.org/10.1101/323519 (2018).
Drenth, J. J. Am. Med. Assoc. 280 , 219–221 (1998).
Article Google Scholar
Sauermann, H, & Haeussler, C. Sci. Adv. 3 , e1700404 (2017).
Kim, S. K. PLoS One 13 , e0200785 (2018).
Papatheodorou, S. I., Trikalinos, T. A. & Ioannidis, J. P. J. Clin. Epidemiol. 61 , 546–551 (2008).
- Supplementary Text and Figures
- Supplementary Information Data
Authorship position should not serve as a proxy metric
Mother–daughter duo work together to find new worlds
Career Q&A 27 FEB 23
Hyperauthorship: the publishing challenges for ‘big team’ science
Career Feature 27 FEB 23
Nature welcomes Registered Reports
Editorial 22 FEB 23
Bright pink ocean and rare wildflower super bloom — February’s best science images
News 01 MAR 23
OpenAI — explain why some countries are excluded from ChatGPT
Correspondence 28 FEB 23
Quick uptake of ChatGPT, and more — this week’s best science graphics
News 28 FEB 23
Research Scientist - Chemistry Research & Innovation
MRC National Institute for Medical Research
Harwell Campus, Oxfordshire, United Kingdom
POST-DOC POSITIONS IN THE FIELD OF “Automated Miniaturized Chemistry” supervised by Prof. Alexander Dömling
Palacky University (PU)
Olomouc, Czech Republic
Ph.D. POSITIONS IN THE FIELD OF “Automated miniaturized chemistry” supervised by Prof. Alexander Dömling
Czech advanced technology and research institute opens a senior researcher position in the field of “automated miniaturized chemistry” supervised by prof. alexander dömling.
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
- Explore articles by subject
- Guide to authors
- Editorial policies
- Climate modelling
- Extreme weather
- Oil and gas
- China Policy
- International policy
- Rest of world policy
- UN climate talks
- Country profiles
- Guest posts
- Media analysis
- State of the climate
- Daily Brief
- China Briefing
- Comments Policy
- Cookies Policy
- Global emissions
- Rest of world emissions
- UK emissions
- EU emissions
- Global South Climate Database
- COP21 Paris
- COP22 Marrakech
- COP24 Katowice
- COP25 Madrid
- COP26 Glasgow
- COP27 Sharm el-Sheikh
- Food and farming
- Plants and forests
- Marine life
- Ocean acidification
- Ocean warming
- Sea level rise
- Human security
- Public health
- Public opinion
- Risk and adaptation
- Science communication
- Carbon budgets
- Climate sensitivity
- GHGs and aerosols
- Global temperature
- Negative emissions
- Rest of world temperature
- Tipping points
- UK temperature
- Thank you for subscribing
Which of the many thousands of papers on climate change published each year in scientific journals are the most successful? Which ones have done the most to advance scientists’ understanding, alter the course of climate change research, or inspire future generations?
On Wednesday, Carbon Brief will reveal the results of our analysis into which scientific papers on the topic of climate change are the most “cited”. That means, how many times other scientists have mentioned them in their own published research. It’s a pretty good measure of how much impact a paper has had in the science world.
But there are other ways to measure influence. Before we reveal the figures on the most-cited research, Carbon Brief has asked climate experts what they think are the most influential papers.
We asked all the coordinating lead authors, lead authors and review editors on the last Intergovernmental Panel on Climate Change (IPCC) report to nominate three papers from any time in history. This is the exact question we posed:
What do you consider to be the three most influential papers in the field of climate change?
As you might expect from a broad mix of physical scientists, economists, social scientists and policy experts, the nominations spanned a range of topics and historical periods, capturing some of the great climate pioneers and the very latest climate economics research.
Here’s a link to our summary of who said what . But one paper clearly takes the top spot.
Winner: Manabe & Wetherald ( 1967 )
With eight nominations, a seminal paper by Syukuro Manabe and Richard. T. Wetherald published in the Journal of the Atmospheric Sciences in 1967 tops the Carbon Brief poll as the IPCC scientists’ top choice for the most influential climate change paper of all time.
Entitled, “Thermal Equilibrium of the Atmosphere with a Given Distribution of Relative Humidity”, the work was the first to represent the fundamental elements of the Earth’s climate in a computer model, and to explore what doubling carbon dioxide (CO2) would do to global temperature.
Manabe & Wetherald (1967), Journal of the Atmospheric Sciences
The Manabe & Wetherald paper is considered by many as a pioneering effort in the field of climate modelling, one that effectively opened the door to projecting future climate change. And the value of climate sensitivity is something climate scientists are still grappling with today .
Prof Piers Forster , a physical climate scientist at Leeds University and lead author of the chapter on clouds and aerosols in working group one of the last IPCC report, tells Carbon Brief:
This was really the first physically sound climate model allowing accurate predictions of climate change.
The paper’s findings have stood the test of time amazingly well, Forster says.
Its results are still valid today. Often when I’ve think I’ve done a new bit of work, I found that it had already been included in this paper.
Prof Steve Sherwood , expert in atmospheric climate dynamics at the University of New South Wales and another lead author on the clouds and aerosols chapter, says it’s a tough choice, but Manabe & Wetherald (1967) gets his vote, too. Sherwood tells Carbon Brief:
[The paper was] the first proper computation of global warming and stratospheric cooling from enhanced greenhouse gas concentrations, including atmospheric emission and water-vapour feedback.
Prof Danny Harvey , professor of climate modelling at the University of Toronto and lead author on the buildings chapter in the IPCC’s working group three report on mitigation, emphasises the Manabe & Wetherald paper’s impact on future generations of scientists. He says:
[The paper was] the first to assess the magnitude of the water vapour feedback, and was frequently cited for a good 20 years after it was published.
Tomorrow, Carbon Brief will be publishing an interview with Syukuro Manabe, alongside a special summary by Prof John Mitchell , the Met Office Hadley Centre’s chief scientist from 2002 to 2008 and director of climate science from 2008 to 2010, on why the paper still holds such significance today.
Joint second: Keeling, C.D et al. ( 1976 )
Jumping forward a decade, a classic paper by Charles Keeling and colleagues in 1976 came in joint second place in the Carbon Brief survey.
Published in the journal Tellus under the title, “Atmospheric carbon dioxide variations at Mauna Loa observatory,” the paper documented for the first time the stark rise of carbon dioxide in the atmosphere at the Mauna Loa observatory in Hawaii.
A photocopy of Keeling et al., (1976) Source: University of California, Santa Cruz
Dr Jorge Carrasco , Antarctic climate change researcher at the University of Magallanes in Chile and lead author on the cryosphere chapter in the last IPCC report, tells Carbon Brief why the research underpinning the “Keeling Curve’ was so important.
This paper revealed for the first time the observing increased of the atmospheric CO2 as the result of the combustion of carbon, petroleum and natural gas.
Prof David Stern , energy and environmental economist at the Australian National University and lead author on the Drivers, Trends and Mitigation chapter of the IPCC’s working group three report, also chooses the 1976 Keeling paper, though he notes:
This is a really tough question as there are so many dimensions to the climate problem – natural science, social science, policy etc.
With the Mauna Loa measurements continuing today , the so-called “Keeling curve” is the longest continuous record of carbon dioxide concentration in the world. Its historical significance and striking simplicity has made it one of the most iconic visualisations of climate change.
Source: US National Oceanic and Atmospheric Administration (NOAA)
Also in joint second place: Held, I.M. & Soden, B.J. ( 2006 )
Fast forwarding a few decades, in joint second place comes a paper by Isaac Held and Brian Soden published in the journal Science in 2006.
The paper, “Robust Responses of the Hydrological Cycle to Global Warming”, identified how rainfall from one place to another would be affected by climate change. Prof Sherwood, who nominated this paper as well as the winning one from Manabe and Wetherald, tells Carbon Brief why it represented an important step forward. He says:
[This paper] advanced what is known as the “wet-get-wetter, dry-get-drier” paradigm for precipitation in global warming. This mantra has been widely misunderstood and misapplied, but was the first and perhaps still the only systematic conclusion about regional precipitation and global warming based on robust physical understanding of the atmosphere.
Held & Soden (2006), Journal of Climate
Rather than choosing a single paper, quite a few academics in our survey nominated one or more of the Working Group contributions to the last IPCC report. A couple even suggested the Fifth Assessment Report in its entirety, running to several thousands of pages. The original IPCC report , published in 1990, also got mentioned.
It was clear from the results that scientists tended to pick papers related to their own field. For example, Prof Ottmar Edenhofer , chief economist at the Potsdam Institute for Climate Impact Research and co-chair of the IPCC’s Working Group Three report on mitigation, selected four papers from the last 20 years on the economics of climate change costs versus risks, recent emissions trends, the technological feasibility of strong emissions reductions and the nature of international climate cooperation.
Taking a historical perspective, a few more of the early pioneers of climate science featured in our results, too. For example, Svante Arrhenius’ famous 1896 paper on the Greenhouse Effect, entitled “On the influence of carbonic acid in the air upon the temperature of the ground”, received a couple of votes.
Prof Jonathan Wiener , environmental policy expert at Duke University in the US and lead author on the International Cooperation chapter in the IPCC’s working group three report, explains why this paper should be remembered as one of the most influential in climate policy. He says:
[This is the] classic paper showing that rising greenhouse gas concentrations lead to increasing global average surface temperature.
Svante Arrhenius (1896), Philosophical Magazine
A few decades later, a paper by Guy Callendar in 1938 linked the increase in carbon dioxide concentration over the previous 50 years to rising temperatures. Entitled, “The artificial production of carbon dioxide and its influence on temperature,” the paper marked an important step forward in climate change research, says Andrew Solow , director of the Woods Hole Marine Policy centre and lead author on the detection and attribution of climate impacts chapter in the IPCC’s working group two report. He says:
There is earlier work on the greenhouse effect, but not (to my knowledge) on the connection between increasing levels of CO2 and temperature.
Though it may feature in the climate change literature hall of fame, this paper raises a question about how to define a paper’s influence, says Forster. Rather than being celebrated among his contemporaries, Callendar’s work achieved recognition a long time after it was published. Forster says:
I would loved to have chosen Callendar (1938) as the first attribution paper that changed the world. Unfortunately, the 1938 effort of Callendar was only really recognised afterwards as being a founding publication of the field … The same comment applies to earlier Arrhenius and Tyndall efforts. They were only influential in hindsight.
Guy Callendar and his 1938 paper in Quarterly Journal of the Royal Meteorological Society
Other honourable mentions in the Carbon Brief survey of most influential climate papers go to Norman Phillips, whose 1956 paper described the first general circulation model, William Nordhaus’s 1991 paper on the economics of the greenhouse effect, and a paper by Camile Parmesan and Gary Yohe in 2003 , considered by many to provide the first formal attribution of climate change impacts on animal and plant species.
Finally, James Hansen’s 2012 paper , “Public perception of climate change and the new climate dice”, was important in highlighting the real-world impacts of climate change, says Prof Andy Challinor , expert in climate change impacts at the University of Leeds and lead author on the food security chapter in the working group two report. He says:
[It] helped with demonstrating the strong links between extreme events this century and climate change. Result: more clarity and less hedging.
Marc Levi , a political scientist at Columbia University and lead author on the IPCC’s human security chapter, makes a wider point, telling Carbon Brief:
The importance is in showing that climate change is observable in the present.
Indeed, attribution of extreme weather continues to be at the forefront of climate science, pushing scientists’ understanding of the climate system and modern technology to their limits.
Look out for more on the latest in attribution research as Carbon Brief reports on the Our Common Futures Under Climate Change conference taking place in Paris this week.
Pinning down which climate science papers most changed the world is difficult, and we suspect climate scientists could argue about this all day. But while the question elicits a range of very personal preferences, stories and characters, one paper has clearly stood the test of time and emerged as the popular choice among today’s climate experts – Manabe and Wetherald, 1967.
Main image: Satellite image of Hurricane Katrina.
- What are the most influential climate change papers of all time?
Expert analysis direct to your inbox.
Get a round-up of all the important articles and papers selected by Carbon Brief by email. Find out more about our newsletters here .
Statistical Modeling, Causal Inference, and Social Science
The most-cited statistics papers ever.
Robert Grant has a list . I’ll just give the ones with more than 10,000 Google Scholar cites:
Cox (1972) Regression and life tables: 35,512 citations. Dempster, Laird, Rubin (1977) Maximum likelihood from incomplete data via the EM algorithm: 34,988 Bland & Altman (1986) Statistical methods for assessing agreement between two methods of clinical measurement: 27,181 Geman & Geman (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images: 15,106
We can find some more via searching Google scholar for familiar names and topics; thus:
Metropolis et al. (1953) Equation of state calculations by fast computing machines: 26,000 Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing: 21,000 White (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity: 18,000 Heckman (1977) Sample selection bias as a specification error: 17,000 Dickey and Fuller (1979) Distribution of the estimators for autoregressive time series with a unit root: 14,000 Cortes and Vapnik (1995) Support-vector networks: 13,000 Akaike (1973) Information theory and an extension of the maximum likelihood principle: 13,000 Liang and Zeger (1986) Longitudinal data analysis using generalized linear models: 11,000 Breiman (2001) Random forests: 11,000 Breiman (1996) Bagging predictors: 11,000 Newey and West (1986) A simple, positive semi-definite, heteroskedasticity and autocorrelationconsistent covariance matrix: 11,000 Rosenbaum and Rubin (2004) The central role of the propensity score in observational studies for causal effects: 10,000 Granger (1969) Investigating causal relations by econometric models and cross-spectral methods: 10,000 Hausman (1978) Specification tests in econometrics: 10,000
And, the two winners, I’m sorry to say:
Baron and Kenny (1986) The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations: 42,000 Zadeh (1965) Fuzzy sets: 45,000
But I’m guessing there are some biggies I’m missing. I say this because Grant’s original list included one paper, by Bland and Altman, with over 27,000 cites, that I’d never heard of!
P.S. I agree with Grant that using Google Scholar favors newer papers. For example, Cooley and Tukey (1965), “An algorithm for the machine calculation of complex Fourier series,” does not make the list, amazingly enough, with only 9300 cites. And the hugely influential book by Snedecor and Cochran has very few cites, I guess cos nobody cites it anymore. And, of course, the most influential researchers such as Laplace, Gauss, Fisher, Neyman, Pearson, etc., don’t make the cut. If Pearson got a cite for every chi-squared test, Neyman for every rejection region, Fisher for every maximum-likelihood estimate, etc., their citations would run into the mid to high zillions each.
P.P.S. I wrote this post a few months ago so all the citations have gone up. For example, the fuzzy sets paper is now listed at 49,000, and Zadeh has a second paper, “Outline of a new approach to the analysis of complex systems and decision processes,” with 16,000 cites. He puts us all to shame. On the upside, Efron’s 1979 paper, “Bootstrap methods: another look at the jackknife,” has just pulled itself over the 10,000 cites mark. That’s good. Also, I just checked and Tibshirani’s paper on lasso is at 9873, so in the not too distant future it will make the list too.
69 thoughts on “ The most-cited statistics papers ever ”
Interesting that the ’95 Vapnik/Cortes paper is on here, but not the earlier, more foundational ’92 paper with Vapnik/Guyon/Boser
Cronbach (1951) Coefficient alpha and the internal structure of tests 22,000
Rumor has it that if you do a Heckman correction without citing his paper, James Heckman will personally call you up and complain.
I’d respond to that one but I don’t want to get run down in a parking lot.
Here is another one: Confidence limits on phylogenies: an approach using the bootstrap by Felsenstein ~25k on GS
I also used GS to try to recreate the citation profile for R.A. Fisher here like in your P.S. here: http://simplystatistics.org/2012/03/07/r-a-fisher-is-the-most-influential-scientist-ever/
This example illustrates how certain fields such as biology have so many cites. I recall learning a few years ago that low-ranking bio journals have higher impact factors than high-ranking statistics journals. There’s something just so wrong about an application to the bootstrap having so many more citations than the original bootstrap paper!
And, looking down on the same page as that paper on Google scholar, I find this one:
12,000 citations: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods K Tamura, D Peterson, N Peterson, G Stecher… – Molecular biology and …, 2011 – SMBE
9000 citations: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood S Guindon, O Gascuel – Systematic biology, 2003 – sysbio.oxfordjournals.org
I just refuse to count these as among the most-cited statistics papers!
Surprise surprise we disagree! :-)
I think that the process of focusing on a specific application can be just as important as writing down the most general case. It is a disservice to those fields and to our field to discount them. As a case in point, that phylogenies paper actually inspired even more general statistical work on the “problem of regions” http://projecteuclid.org/euclid.aos/1024691353 .
If you don’t count those as most cited statistics papers you should remove all citations to the above papers that came from someone’s software that implemented those methods. How many citations would GEE/Kaplan Meier/Cox regression/FDR/Arch/Bagging/Boosting have without the software that allowed users to implement them/use them?
I don’t disagree that these papers are useful , I just think you have to draw the line somewhere.
Here’s how I see it: if someone develops new statistical theory or methods, then that’s a statistics paper, and I count as citations all the other statistics papers that cite it, and also all the applied papers that cite it. But if someone takes an existing statistical method and ports it to another field, I don’t consider it eligible for the “most cited statistics papers.”
The econometrics papers on the list, by the way, I do not consider as porting of existing statistical methods. On the contrary, those econometrics papers are statistics research that happen to appear in non-statistics journals.
“But if someone takes an existing statistical method and ports it to another field, I don’t consider it eligible.” Doesn’t that mean that if you looked hard enough, none of them would be eligible ?
Speaking of psychology:
* 16,384 cites: Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and quasi-experimental designs for research (pp. 171-246). Boston: Houghton Mifflin.
* 12,644 cites: Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological bulletin, 56, 81-105.
13,553 cites: Cohen, J. (1992). A power primer. Psychological bulletin.
10,430 cites: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin.
Continuing along this theme, I google-scholar’d *maximum likelihood* and this appeared:
11,000 citations: Refinement of macromolecular structures by the maximum-likelihood method GN Murshudov, AA Vagin, EJ Dodson – Acta Crystallographica Section D: …, 1997 – iucr.org
And a search on *Bayes* yields:
13,000 citations: MRBAYES: Bayesian inference of phylogenetic trees JP Huelsenbeck, F Ronquist – Bioinformatics, 2001 – webpages.icav.up.pt
and another 14,000 for MrBayes 3: Bayesian phylogenetic inference under mixed models F Ronquist, JP Huelsenbeck – Bioinformatics, 2003 – Oxford Univ Press
You get the picture.
But then I found this one, which I’d never heard of and has an amazing 24,000 citations:
A new look at the statistical model identification H Akaike – Automatic Control, IEEE Transactions on, 1974 – ieeexplore.ieee.org
Never heard of Akaike 1974? But it’s a citation classic …
I know that you have read a paper referring Akaike, 1974 :)
What’s wrong is paying too much attention to citation counts. If I was famous & published some egregiously crappy heresy I bet I’d get cited a lot too. For the wrong reasons, of course.
How many cites did Daryl Bem get for his paper?
Kaplan & Meier (1958) Nonparametric estimation from incomplete observations : 43,017
D’oh indeed! How did I not…
Bland/Altman is the most beloved medical researchers’ paper; the plot of a-b against a+b is something like the unnormalized residual plot, and supposedly is called Tukey plot elsewhere (never saw this, however).
When medical researchers review statistic (they do it a lot, I fear), they check for the presence of the following “always correct” terms:
a) p-values b) t-test c) p-values d) wilcoxon e) p-values f) Bland-Altman plots g) p-values h) Semi-professionals: Area under curve (AUC) i) p-values
Everything else should be returned, because it’s modernistic stuff nobody understands. If you are lucky, the journal has a statistical reviewer.
Ha! How true. On the subject of p-value obsession, I note in today’s news that Earl Grey tea reduces cholesterol. No wonder nobody believes us.
Bland and Altman are slightly chagrined that their contribution is considered to be the Bland-Altman plot. Bland says “we never claimed priority for plotting difference versus mean and that adding the limits of agreement was our contribution”.
Jeremy, I fully agree. Both Bland and Altman have published the most lucid papers on statistics that can be understood by mere mortals. And Altman is an excellent reviewer.
What we really need is a cumulative citation index. Take the bootstrap as an example. Lots of people use the bootstrap but have never read the original article. There’s a good chance they read one of the multitude of “Using the bootstrap in the field of X.” The best ideas experience this second wave of dissemination, papers that don’t really add anything new, but recast and reword the original work to appeal to a broader audience.
On Baron and Kinney, I suspect at least 1/2 of the citations are refuting the work. Any time one prominent article comes out applying mediation analysis, there are at least 5 papers written to refute the original BK approach.
What I think we need with some urgency is a (possibly crowd-sourced) database of papers indexed by the statistical methods used. That way not only could you trace what people learnt (or didn’t) over time about how to do Method X, you’d also be able to find nice and nasty examples of writing it up.
Aiken & West (1991) – 21,606 – Multiple Regression: Testing and Interpreting Interactions Or are we not doing books? :-) Neat post!
Especially the works about interactions/mediaton etc. seem to be cited very often indeed. Is there a special reason for that? And why on earth are so many publications wrong if the basic articles are cited so often? Well, I suppose because nobody is actually reading them…
If you see Baron and Kenny cited nowadays, it tends to be to say “The Baron and Kenny approach to mediation is known to be underpowered and so we used …”, but if you don’t mention Baron and Kenny, reviewers ask why not.
Actually my methods professor (in sociology) cited them and used their approach in addition with the Sobel-Test… thus I also used them in – I think – two assingments, maybe even in my Bachelor thesis, I don’t remember it exactly (only that I also mentioned something about Hayes and Preacher and bootstrapping being a better alternative…). Anyway, that’s not so much my point. There are obviously some influential papers about interaction and mediation which are cited very often but on the other hand a really huge amount of papers in Sociology are making basic mistakes in interpreting interactions, like interpreting the coefficient of X or Z as “main effects” even if X*Z is part of the model and neither X nor Z could equal zero. It’s probably at least as bad as shown by Brambor et al. (2006) for the political sciences…
This weekend I learned how these citation indices have exploded over the last ten years. (Put in various high profile individuals – or even average people – and see how citations have accelerated exponentially.) This is all good and understandable in terms of the internet and the rise of statistics etc etc. But is it also a sign of publication inflation? Publications have a kind of currency value in academia. The trend feels bubble like – although as has been discussed here before publications don’t really have a historical metric or intrinsic value to be constrained. Has the value of scientific output expanded by a factor of 5 to 10 over the last decade? To me these citation trends are simultaneously exciting and worrying.
There is a quote out there that goes something like: “Soon the journal articles will be being published faster than the speed of light, fortunately this does not violate the laws of physics since they convey no usable information.” I couldn’t find it right now, but remember the original being worded better.
C’mon, Andrew. No love for Baron & Kenny? ;-)
Altman has one of the largest h-indicies of anyone I’ve ever seen:
If that impresses you, then you may find this article interesting “A list of highly influential biomedical researchers, 1996-2011”
Breiman, Classification and Regression Trees, 1984: 24,011 citations
Nope, I already said this in the post: No books. There are tons of books with more than 10,000 citations. For this discussion, we’re restricting to articles.
Fair enough on no books, but Snedecor and Cochran mentioned by you earlier was a book.
Yes, but Snedecor and Cochran was not on the list. I just gave its citations for a comparison point.
I tabulated the journals the papers at the top (excluding those in the comments as I’d got bored by then) appeared in, and it’s interesting how few appeared in the ‘top’ statistics methodology journals: JRSSB got 3, JASA 1, Biometrika 2, Ann Stat 0. In contrast, Econometrica has 5, Machine Learning 3, and even the Lancet 1 (Bland and Altman). And yet I know of statistics departments where they’re only looking for papers in the ‘top’ methodology journals when considering prospective faculty, and anything else is disregarded.
R Ihaka, R Gentleman, R: a language for data analysis and graphics (7203) RDC Team, R: A language and environment for statistical computing (12614)
Basic Local Alignment Search Tool SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman, Journal of Molecular Biology, 1990
49,949 citations according to Google scholar. Maybe not a pure statistics paper, but it established the importance of melding statistics and algorithms to make big data analysis feasible. And more highly cited that any other paper in this discussion.
What about the time dimension? I’d rather rank by citations/year, accounting for the time since publication.
I think this puts older papers at a significant disadvantage. Maybe applying some kind of weighting taking into account the number of papers published by the journals citing the paper?
How about the paper “Statistical aspects of the analysis of data from retrospective studies of disease” (Mantel & Haenszel, 1959) with 10255 citation on Google Scholar? A rather influential paper for epidemiologists and other (bio-)medical researchers.
I’ll hazard a note that the paper by Bland and Altman is more or less a port or a slight refinement of the Tukey’s mean-difference plot to the medical field. In addition, Eksborg (1981) proposed a similar method based on Deming regression.
However, such often seems to be the case. A method/software is taken into use in some new field, and it starts acquiring citations fast. There are several other examples of such papers on this list, such as the Felsenstein’s paper, but without it the bootstrapping methods might not be taken into use in taxonomy, since the uses of bootstrapping in the taxonomy are not instantly obvious from Efron’s papers or book(s).
Somehow back-propagated citations in such cases could be nice to have, though.
For some reason Google Scholar lists Mantel & Haenszel as a book chapter from 2004. Google Scholar, like Wikipedia, is great but not always perfect!
@Andrew: The Mantel-Haenszel is from the Journal of the National Cancer Institute (Pubmed: http://www.ncbi.nlm.nih.gov/pubmed/13655060 ). I also have the actual paper laying around somewhere…
19,000 Multiple range and multiple F tests. DB Duncan – Biometrics, 1955 – psycnet.apa.org
14,000 Statistical analysis of cointegration vectors S Johansen – Journal of economic dynamics and control, 1988 – Elsevier
1. How could you miss the BIC Paper: Estimating the dimension of a model G Schwarz – The annals of statistics 1978, Cited by 20560
2. The following paper: Inference of population structure using multilocus genotype data JK Pritchard, M Stephens, P Donnelly, Genetics 2000, cited by 11531
Title&journal look like a genetics paper but written by statisticians and introduced a new statistical model (very similar to latent diriclet allocation later presented by Blei,Ng&Jordan)
Oohhhh, I hate that BIC thing! And how horrible to think that this paper got more cites than AIC. But I promise I didn’t exclude it on purpose; I just didn’t think of looking for it, nor did I come up with it in any of my searches.
Searching some more I came across another Google Scholar error:
24,000 Generalized linear models P McCullagh – European Journal of Operational Research, 1984 – Elsevier
I’m pretty sure these are mostly references to the McCullagh and Nelder book of the same title.
Why do you hate BIC so much? Since reading it first I really liked it – elegant, short and just ‘feels right’. But I’m far from an expert so would like to know – what are the problems/drawbacks of the criteria? do you have any valid criticism to justify your emotional reaction?
My reaction is not emotional. I just don’t think BIC solves the problems it purports to solve. For more on the topic, see this 1995 article I wrote with Rubin.
In some multivariate modeling examples (glm, gnm, glmm, etc), I agree with you. BIC in isolation doesn’t appear to provide the same type of robustness about information ‘lossiness’ that it’s AIC counterpart does in these instances (I’ll leave AICc alone for now). Still having both measures as guidance in model development can be far better than one (i.e. how during model development, AIC and BIC can interestingly oppose directions serially as variables of low predictive power are begun to be added to a more ‘established’ model – I’ve found this phenomenon to be an interesting indicator for when to rerank remaining variables (i.e via stepwise) not yet tested for inclusion/exclusion in/from a model).
Thanks! I’m not entirely convinced but does give a arguments against BIC I didn’t think of before
I often cite Baron & Kenny only to say that better methods are now available.
@Peter: April Fool’s……right? :)
While the meshuggas continues in the frequent guffaws over citation frequency (please indulge further with extreme prejudice and sarcasm – and note if something ‘data sciency’ happens to spring forth with some fantastical citation volume), some of the ‘shockers’ mentioned even in jest I think should prompt questions over the nature, and potential tractability of the classification of the citations.
I’ve often been curious about how citations might be employed. Is the citation a fairly insignificant,or even downright irrelevant blurb? Is the citation critical to the research or its argument? If so, in what way and to what degree? Was there a referee/faculty/other request to add it (“Why didn’t you reference so-and-so’s ‘seminal’ paper on xxxx? You might want to consider this. Kind regards, Joey B. Gatekeeper”). Yep, here we go with impact factor rererethought :).
To tag onto the Gscholar thread, I’ve noted more folks using this tool as a single source of citation searching. I’ve noted Wikipedia refs killing entries dues to lit published in the 70s, which naturally Gscholar missed, and the armchair refs never thought to consult another citation index. Since when did Gscholar become the bastion of citation indexing? This could be a fun discussion :)
One of the great injustices of citations is that Oaxaca-Blinder got all the credit for Duncan’s decomposition method, which he developed and published about a decade before Oaxaca-Blinder came along. As I recall, O-B recognized this, but that didn’t stop future generations of economists from citing the economists’ derivative paper instead of the sociologist’s original paper. A pretty typical pattern for economists, really.
I am self-interested, as the author of that 1985 paper on applying the bootstrap to inference of phylogenies, which has 25K citations in Google Scholar. Should that be included? It gets more citations than Efron’s paper on the bootstrap. Is that just?
I think that it just depends on what you want to count. Some fields have lots of people using particular statistical methods, and the papers they cite rise high in citation listings. The easiest way to correct for this would be to cite how often particular papers are cited in statistical journals, excluding the journals in application areas. If you did that, Efron’s paper would rise and mine would nearly disappear. (I will note that in the printed collection of Efron’s papers recently published, I commented on the Efron-Halloran-Holmes paper on bootstrapping phylogenies, and made exactly this point).
I am proud to have published a paper that is about 7th in all of statistics, in citations by scientists. But I am happy to acknowledge that in terms of influence within statistics, my paper is a minor one. And that that would be reflected in its being far down the list of statistical papers frequently cited by statisticians.
But what I think is a recipe for wrangling and confusion is to not make this distinction, to take widely cited papers on applications of statistics and try to come up with some reason why they “aren’t really statistics”. You end up becoming the Statistics Police, and making yourself look silly.
I would not want to restrict the count to citations in statistics journals, as I’m particularly interested in statistical methods that have been applied more generally. It happens that biology has a much larger scientific literature than, say, political science (for good reason: we put huge resources in biology so as to ultimately save lives and make people less sick and more comfortable!), and so methods that have particular relevance to biological applications get more citations. That seems fair to me. If a statistical method is important in biomedical research, it’s important in an absolute scale. If a statistical method is important in political science, or astronomy, or some other relatively small field, that’s fine but it’s ultimately having less of an applied impact (at least, as measured by citations, which seems to me to be a recent measure if not perfect). So I have no problem looking at papers that are cited in the biology literature, not at all.
So, just to clarify, my measure of popularity in the above post is not “influence within statistics.” I really am concerned with influence in the scientific literature.
In my post above, I wanted to distinguish between papers that developed new statistical methods from those that apply existing methods. I assume that a paper such as “Refinement of macromolecular structures by the maximum-likelihood method” is highly valuable (given that it has been cited over 10,000 times) but I’d rather not include it on the list of most-cited statistics papers in that, unlike the papers of Baron and Kenny, say, or Zadeh, or Heckman, or the others on that list, it’s not presenting a new method, it’s presenting the application of an existing statistical method. The boundaries here are not precise, but a classification is not useless just because it has some necessarily subjective aspects. My rule of not counting explications of existing methods in the list is similar to my rule of not counting books.
Finally, just to be clear because otherwise readers of your comment might not realize: nobody in the above thread referred to your paper as “not really statistics.”
Sorry if I misinterpreted you as saying my paper was “not really statistics”. I must have misread this comment of yours:
if someone develops new statistical theory or methods, then that’s a statistics paper, and I count as citations all the other statistics papers that cite it, and also all the applied papers that cite it. But if someone takes an existing statistical method and ports it to another field, I don’t consider it eligible for the “most cited statistics papers.”
It was true that my paper was highly cited, but it was not considered true that “that’s a statistics paper”. I will accept your assurance that it is a you never said that it “wasn’t really statistics”. That you said and meant that it was statistics, in a paper, but just wasn’t a “statistics paper”.
I do think that your cri de coeur that
“There’s something just so wrong about an application to the bootstrap having so many more citations than the original bootstrap paper!”
is misguided. That’s just the way applied statistics often works, not a flaw in the system.
I do think that influence within statistics would be interesting to measure, and observing which statistical papers are cited by other statisticians would be a possible approach, requiring manual intervention mostly in choosing an initial list of statistics journals. Your approach requires judgement calls on individual highly-cited papers, a different stage of the process.
I would expect that in any such list of statistics papers influential with statisticians, mine would be basically absent.
That’s right, your paper was highly cited and it is statistics, but I did not want to include it on the list for the same reason that I did not want to include my own Bayesian Data Analysis on the list: I wanted to restrict to papers that developed original methods or syntheses. Arguably your paper is an original method in that it takes originality to apply a method developed in one field to another, just as arguably my book is an original synthesis because there’s a research contribution, not merely expository, to structuring a set of existing methods into a larger conceptual framework. Still, I felt more comfortable putting your paper and my book outside the box, because i wanted to focus on papers that were clearly developing something new in statistics.
I agree that the line is not sharp but I still would like to draw the line somewhere. But in no way is this “line” a disparagement of work outside the line I’m drawing; I just think that contributions such as your paper and my book are a different sort from contributions such as those of Cox, White, and the others in the above list.
As a separate point, I agree with you that my statement, “There’s something just so wrong about an application to the bootstrap having so many more citations than the original bootstrap paper!” is wrong. I hadn’t thought it through, and I think you’re completely right that this is how applied statistics often works, and that’s just fine.
typo in my just-submitted comment: should read ” … your assurance that you never said …”
Marquardt, Donald W. “An algorithm for least-squares estimation of nonlinear parameters.” Journal of the Society for Industrial & Applied Mathematics 11.2 (1963): 431-441. 21494 citations
Bollerslev, Tim. “Generalized autoregressive conditional heteroskedasticity.” Journal of econometrics 31.3 (1986): 307-327. 15553 citations
Engle, Robert F., and Clive WJ Granger. “Co-integration and error correction: representation, estimation, and testing.” Econometrica: journal of the Econometric Society (1987): 251-276. 23083 citations
Citation frequency goes a bit counter the idea that science is not decided by votes but by being correct (or longest unfalsified, to adhere to Popper’s logic). What is a bigger concern is that there still is, though it differs from science to science, a lot of material published in other languages than English. this slows reception and certainly distorts the citation index (and vice versa, the index shows only part of the world in science but is taken for the whole, esp. by raters). Developmental psychologist Piaget was publishing in French in Geneva, neighboring on France, Italy and Germany. But his reception history in Germany began after the first English translation appeared in the US. Citations should really be seen in context. Equally one should compute relative indexes, like “how many citations per total number of papers in a field” or “… total number of authors in a field” etc. Certainly it makes a difference, if there are only ten papers published in a certain specialty in any given year and one of them gets cited in all of the ten, as against a field with thousands of papers where someone gets a hundred citations?!
Pingback: These are the statistics papers you just have to read - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 1-38.
Read the above post. This paper is already on the list.
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
David Shultz Here at Science we love ranking things, so we were thrilled with this list of the top 100 most-cited scientific papers, courtesy of Nature. Surprisingly absent are many of the landmark discoveries you might expect, such as the discovery of DNA's double helix structure.
The most cited work in history, for example, is a 1951 paper 2 describing an assay to determine the amount of protein in a solution. It has now gathered more than 305,000 citations — a...
So what's the most cited paper of all time? That distinction goes to a 1951 paper by U.S. biochemist Oliver Lowry and colleagues describing an assay to determine the amount of protein in a...
When we excluded conference papers, almost two-thirds belonged to medical and life sciences (86/131). Among the 265, 154 authors produced more than the equivalent of one paper every 5 days for 2 ...
In a study based on the Web of Science database across 118 scientific disciplines, the top 1% most-cited authors accounted for 21% of all citations. Between 2000 and 2015, the proportion of citations that went to this elite group grew from 14% to 21%.
It takes a mammoth 12,119 citations to rank in the top 100 list. Many of the world's most famous papers, including Einstein's special theory of relativity, the determination of DNA double helix and the discovery of high-temperature superconductors did not make it to the list.
With eight nominations, a seminal paper by Syukuro Manabe and Richard. T. Wetherald published in the Journal of the Atmospheric Sciences in 1967 tops the Carbon Brief poll as the IPCC scientists’ top choice for the most influential climate change paper of all time.
The most-cited statistics papers ever Posted on March 31, 2014 10:54 AM by Andrew Robert Grant has a list. I’ll just give the ones with more than 10,000 Google Scholar cites: Cox (1972) Regression and life tables: 35,512 citations. Dempster, Laird, Rubin (1977) Maximum likelihood from incomplete data via the EM algorithm: 34,988