Classic Papers: Articles That Have Stood the Test of Time

drfuchs · on June 17, 2017

They completely missed, with 1800+ citations, the winner of the “Theory of Cryptography Conference (TCC) 2016 Test of Time award”: “Calibrating Noise to Sensitivity in Private Data Analysis” by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Oh, it also just won the 2017 Gödel Prize; it really ought to be at the top of both the “Theoretical Computer Science” and “Computer Security and Cryptography” lists.

Worse still, with ~3000 citations, Dwork’s “Differential Privacy” (ICALP (2) 2006: 1-12), should rank even higher in the Theoretical Computer Science list. But Google Scholar has completely lost track of that foundational paper; it’s got it all confused with a completely different paper, Dwork’s 2008 “Differential Privacy: A Survey of Results”. Note that this also means that anybody searching for the general topic “differential privacy” on Google Scholar will not get to see the most-cited paper about it! https://www.microsoft.com/en-us/research/wp-content/uploads/...

Disclaimer: Dwork and I have been seen together, for 24 years.

jventura · on June 17, 2017

From the article: "This release of classic papers consists of articles that were published in 2006..". Your second one could be there (I haven't looked for it), but you're mentioning some problems with the article, maybe it's that..

drfuchs · on June 17, 2017

They were both published in 2006, so not sure what you're getting at.

Google Scholar and Sean Henderson are promulgating a false historical record, and there seems to be no way to inform them so that they may correct themselves, other than whining here on HN and hoping they notice. Anybody have any other suggestions?

frankmcsherry · on June 17, 2017

https://support.google.com/scholar/contact/general

drfuchs · on June 27, 2017

For the record: Re-confirmed that Google Scholar's "support" page is useless. It simply replies with an email indicating that while you can go ahead and complain, they're not going to bother to do anything to fix their algorithm no matter how wrong-headed it is, so tough luck for you and the rest of the unsuspecting, misinformed, current and future universe. Same result as from the previous 3 tries to correct the record. And same as with recent attempts to contact them via email and even USPS snail-mail.

They don't even seem bothered that this in turn leads to Google's own data-miners publishing false results based on Google Scholar's error-filled data; Sean Henderson and Anurag Acharya both have their names on the erroneous blog entry, and still it remains uncorrected. One might think that they would't want their names associated with false information, and messing up the true historical record.

Anyway, congratulations on being presented with ACM SIGACT's 2017 Gödel Prize "for the invention of Differential Privacy" in the "Calibrating Noise to Sensitivity in Private Data Analysis" paper at last week's ACM Symposium on Theory of Computing (STOC). Too bad Google Scholar seems intent on hiding it. Maybe all the search-terms I've semi-awkwardly included here will help future (re)searchers find it, as well as Dwork's "Differential Privacy" ICALP 2006.

drfuchs · on June 18, 2017

(Frank?! Congratulations!) Yeah, that's a black hole, at least as of some time ago. But, in your honor, I'll try it again.

nyrulez · on June 17, 2017

This has left me scratching my head - why just 2006 ? Having just one year of publications and labeling them "Classic Papers" is pretty misleading as the term is used to indicate a wide gamut of publications over a much longer period of time. It should be just called "Top papers or research from 2006". Unless this expands to at least cover a decade, it shouldn't be labeled as such.

This almost sounds like collecting my most liked pics from 2006 on Facebook and creating an album "Best moments of my life".

Do they not have data before 2006 ?

vitus · on June 17, 2017

I was expecting "classic" to mean papers like Part-time Parliament, Mathematical Theory of Communication, Unix Time-Sharing System, etc. Certainly was in for a surprise...

They certainly do have data prior to 2006, based on Google Scholar results. It seems like an odd choice, but it's explicitly stated that these articles were chosen because they're roughly 10 years old.

I do find some of their choices a bit odd, though. Surely they can come up with better examples? The BigTable paper (OSDI '06) out of Google itself has far more citations (~4x per google scholar citation counts) of the highest-ranked DB paper, and I'd say it's much higher impact than any of them, being one of the early papers of the NoSQL movement. I'd understand if the algorithm in play were more nuanced, but the introductory page explicitly states that these are the most-cited papers of 2006, which doesn't seem to be the case.

Obligatory disclaimer: despite my current employment status, these views don't represent Google's.

a3n · on June 17, 2017

> This has left me scratching my head - why just 2006 ?

As they said in the post, they're measuring cites 10 years after. It's 2017. I imagine 2006 is their "inaugural year."

RhysU · on June 18, 2017

Measuring citations by year Y+10 for publication year Y could be run for all historical years pretty easily.

tandav · on June 17, 2017

I remember how I was googling deeply for the most cited papers / science articles of all time and didn't find anything.

I naively thought that it is a simple thing and someone have that "collection of best articles".

Things are going to more like "this is hard problem"

enimodas · on June 17, 2017

There was http://www.nature.com/news/the-top-100-papers-1.16224 in 2014

blt · on June 17, 2017

Wow, I was not expecting so many to be biology and chemistry. Good reminder that computer science isn't the whole world. Nice to see EM, simulated annealing, and Levenberg Marquardt in the bottom 50.

tgb · on June 17, 2017

Some fields cite more and publish more than other fields. It doesn't really make any sense to compare citation counts between fields. Moreover, Google's ranking removes review articles and many other things - it purely looks for 'new research' articles. Those get cited much less frequently than reviews or methods papers.

amelius · on June 17, 2017

This should be as simple as running a query in e.g. scholar: select area/field, sort by most cited, while ignoring citations that occur within x years of publication. Also, one could expand the citation relation transitively (like pagerank but without cycles).

ThomPete · on June 17, 2017

They will release more.

diggan · on June 17, 2017

Nice list, but as many other said, seems to only be for 2006.

For more papers, there is a nice list here: http://jeffhuang.com/best_paper_awards.html not limited to 2006

There is a bunch more places to get papers listed here too: https://github.com/papers-we-love/papers-we-love#other-good-...

bokertov · on June 17, 2017

Is the author JH He of the #1 paper in computational mathematics a self citing spammer?

https://www.google.com/amp/s/selfcitation.wordpress.com/2011...

whynotqat · on June 17, 2017

As one might guess, there is a lot wrong with this list even within there stated goals. My examples are drawn from mathematics, since that's what I know. They appear to use the journal to classify category, which doesn't work very well since many of the best results are published in general journals. Additionally, since citation counts vary so widely between sub-fields, there is a strong pull towards selecting misclassified work from higher-citation fields. For example the paper "High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension" is listed in geometry but belongs elsewhere, and there are no probability papers in the category "Probability and Statistics with Applications". Also, the "Pure & Applied" category is meaningless. That list seems to be the most cited papers from five arbitrary journals. I guess it's a reminder that these problems are hard to automate, and that your work doesn't have to be perfect to share.

glup · on June 17, 2017

Cognitive Science suffers from the same problem of misclassifications from higher-citation fields (neuroscience).

Agreed that projects don't have to be perfect but it does have to have some functionality to ship... I don't see how I could use this could help me construct a course reading list or to improve my understanding of my academic field, given the problems.

phreeza · on June 17, 2017

There wasn't even a neuroscience category on the page, only neurology and neurosurgery.

stonesixone · on June 17, 2017

Also, were you able to find any papers in number theory? That's a huge gap as it is one of mathematics's primary subfields. Analysis seems to represented, as well as topology (via "geometry").

dev_tty01 · on June 17, 2017

Should be labeled "Top cited papers of 2006" or something similar. Calling this collection "Classic Papers" is misleading at best.

a3n · on June 17, 2017

No, it's an exactly accurate name for their feature. For which they have only yet released the 2006 edition.

logicallee · on June 17, 2017

Out of curiosity, does anyone have any examples of scientific books (or papers) that are the exact opposite: influential or famous at the time but completely and utterly destroyed by the test of time. Like, that seem silly to us in how completely and utterly wrong they turned out to be in their every single conclusion.

I'm thinking about research versions of Lord Kevin's favorite edict: "Heavier than air flying machines impossible" or the patent person (examiner? head of patent office?) who in the nineteenth century said everything that can be invented has been invented.

ferdterguson · on June 17, 2017

Not a field, but a person who everyone thought was Nobel prize bound and it turned out to be all BS. You may think that it's just one person, but the amount of research dollars that got allocated to try and prove or disprove all of this work would be staggering. https://en.wikipedia.org/wiki/Schön_scandal

Houshalter · on June 17, 2017

Sure fields of research go obsolete all the time. E.g. much of the computer vision stuff from 2006 is basically dead now. If you go further back, a lot of early AI research was exciting at the time, but is entirely forgotten about now.

mindcrime · on June 17, 2017

If you go further back, a lot of early AI research was exciting at the time, but is entirely forgotten about now.

Interesting that you would use that example. I suspect, although I can't prove, that this is largely a mistake. Or maybe not so much a mistake as a choice that will wind up being revisited. That is, I think there is still a lot of "meat on the bone" for many of the AI techniques that were being explored in the 70's and 80's, and we will see another round of things suddenly coming back into favor at some point. It's happened before... remember when ANN's were completely out of vogue, and the computing power and data availability caused a sudden resurgence in interest in those? I would not be surprised to see similar things happening w/r/t various aspects of GOFAI.

More likely, I think we'll see additional integration / hybridization of probabilistic / pattern matching systems (using ANN's / Deep Learning / etc.) and symbolic processing and automated reasoning.

'course, I might be totally wrong, but that's my feeling ATM.

Finnucane · on June 17, 2017

There's some difference between obsolete--a method that gets replaced by an improved method later on--and just plain wrong.

dekhn · on June 17, 2017

Any paper about the luminiferous aether from before the Michelson Morley experiment?

glup · on June 17, 2017

Methodology is not described and the resulting collections are of notably poor quality. Given Google's privileged position in knowledge production I wish they would be far more careful in cases like this.

ivan_ah · on June 17, 2017

For everyone disappointed to see papers only from 2006, here is a consolation prize. Creating a Computer Science Canon: a Course of “Classic” Readings in Computer Science: http://l3d.cs.colorado.edu/~ctg/pubs/sigcsecanon.pdf (CS only, date range = [1806:2006])

kensai · on June 20, 2017

This is also very interesting: the AAAI Classic Paper Award.

The AAAI Classic Paper award honors the author(s) of paper(s) deemed most influential, chosen from a specific conference year. Each year, the time period considered will advance by one year.

Papers will be judged on the basis of impact, for example:

    Started a new research (sub)area
    Led to important applications
    Answered a long-standing question/issue or clarified what had been murky
    Made a major advance that figures in the history of the subarea
    Has been picked up as important and used by other areas within (or outside of) AI
    Has been very heavily cited

https://aaai.org/Awards/classic.php

joatmon-snoo · on June 17, 2017

Noticeably missing: Gray and Lamport's "Consensus on Transaction Commit"

spatulon · on June 17, 2017

That paper appears to have been published in 2004, not 2006.

joatmon-snoo · on June 17, 2017

To arXiv in '04, but to ACM in '06.

idlewords · on June 17, 2017

In the Middle Eastern and Islamic Studies section, five of the ten cited papers are about Turkey. Another is about representation of Islam in the Australian media.

This... doesn't seem like a very representative selection of 'timeless' papers.

nickpsecurity · on June 17, 2017

The security examples were weak. Far more influential were the Ware or Anderson reports, MULTICS security evaluation, anything describing Orange Book-style systematic assurance of whole systems, at least one on capability-security or by Butler Lampson (did access control too), something on monitoring/logging, something on static analysis, CompCert or Coq, and so on.

Things that had a major impact on the problems they focused on which many other papers doing something similar built on or constantly referenced. I'm skeptical of citations in general since those who chase them usually do a high number of quotable papers in whatever fad is popular instead of hard, deep, and critical work. Those I listed are the latter with who knows what citations. The collection is probably still nice for finding neat ideas or just learning in general.

nadim · on June 17, 2017

Classic Albums: 5 Mics in the Source

https://en.wikipedia.org/wiki/The_Source#The_Source.27s_Five...

Aardappel · on June 17, 2017

No "programming language design and implementation" category?

seanmcdirmid · on June 17, 2017

Looks like those would be under "Software systems."

Aardappel · on June 17, 2017

Seems like at best 1 out of 10 in that category qualifies.. but yes, this is just 2006, so hard to tell.

nickpsecurity · on June 17, 2017

Should be easy. Just look for ALGOL, LISP, Pascal (or Wirth), BCPL/C, ML, Haskell, Prolog, and META II. They should all be there since tons of CompSci work and many commercial products came from these. About six also establish a new or altered paradigm of programming, too. If it doesn't have most of them, then the list is bullshit. If it does, then it's solid.

Note: They all came way before 2006, too. Should've been easy for authors to find. :)

hkon · on June 18, 2017

For computer science, I find most useful papers are from before 1990. Looking forward to that being included.

threepipeproblm · on June 17, 2017

Ironically, you have to copy, paste and Google the titles of most of these to find downloadable versions.

blt · on June 17, 2017

sci-hub.cc can help with those that don't show a PDF in the Google results.

threepipeproblm · on June 17, 2017

I hope sci-hub and libgen can stay afloat. sci-hub.ac is also up atm. "to remove all barriers in the way of science" -- be advised it's not necessarily legal.

husamia · on June 20, 2017

all the articles were only published in 2006! I tried to change the data to 2017 but it didn't work

teddyh · on June 17, 2017

Flagged for misleading headline.

qrbLPHiKpiux · on June 17, 2017

A lot has happened in my profession since 2006...

jldugger · on June 17, 2017

But it wouldn't necessarily be a 'classic'.

The point of the exercise is to find papers that are widely considered valuable, especially to other researchers. To do this, they're using citation counts.

There's obviously a number of problems with citations, including self-cites, negative citations ("Alice & Bob '06 shook the community when they found things, but our better, larger study finds no evidence of any effect"), and such. But it makes sense for a company built upon citation rank indexing to rely on such methods =)

seasonalgrit · on June 17, 2017

"a collection of highly-cited papers"

no, a collection of titles. a collection of papers would be very useful; these are just links, e.g., to paywalled sites.