what does falsely claiming that google invented MR make him then ?

gaius · on June 26, 2014

I see the Google fanboys and wannabes are out in force on this thread.

seanmcdirmid · on June 26, 2014

I see the crazies are out trying to redefine MapReduce as just being map and reduce and completely missing the point. But whatever, they've probably never seen big data loads and are definitely not involved in the industry.

gaius · on June 26, 2014

Ooh, scary big data.

I could run your workloads in Excel without breaking a sweat. But go on kidding yourself.

seanmcdirmid · on June 28, 2014

I don't think Excel scales to 10 or 100 TB of data.

gaius · on June 28, 2014

In all seriousness tho', I was running data sets that big in Oracle, in 2006. You can see why I don't take "big data" seriously.

ithkuil · on July 1, 2014

There's certainly a hype around big data nowadays, often even up to the point of being ridiculous.

The point is that people are starting to use this term to describe something that it's not even technical anymore, let alone describe the actual amount of data: merely using data to drive decision making.

This is not a new thing [0], yet there is a clear trend that shows how this kind of dependency is shifting from being auxiliary to being generative; some of the reasons are:

1. cheaper computing and storage power

2. increased computing literacy among scientists and not.

3. increased availability of digitalised content in many areas that capture human behaviour.

When there's request, there's opportunity for business. One thing that is new and big about Big Data is the market. It should be called "Big Market (of data)".

It's an overloaded term. IMHO it's counterproductive to let the hype around Big Data as a business term pollute the discussion about what contribution Google and others have made in the field of computer science and data processing.

So what did Google really invent? Obviously the name and concept behind MapReduce wasn't new. Nor the fact that they did start to process large amounts of data.

Size and growth are two key factors here. Although it's possible that the NIH syndrome affected Google, it's possible that existing products just weren't able to solve those two requirements. It's difficult to tell exactly how large given that the Google is not very keen at releasing numbers, although it's possible to find some announcements like [1] "Google processed about 24 petabytes of data per day in 2009".

20P is 10000 times more that 200 T. Stop to think a moment what does 10000 mean. It's enough to completely change the problem, almost any problem. A room full of people becomes a metropolis; an US annual low wage salary becomes 100 million dollars, more than the annual spending of Palau [2]. Well, it's silly to make those comparison, but it's hard to think about anything that scaled by 10000 doesn't change profoundly. Hell, this absurdly long post is well under 10k!

To stay in the realm of computer science, processor performance didn't increase by a factor of 10000 since PDP-11 from 1978 to Xeon from 2005 [3].

Working at that scale poses unique problems, and that's where real the contributions to the advancement of the field made by the engineers and the engineering culture at Google are placed. If anything, just knowing it's possible and having some accounts on what they focused on is inspiring.

This is the Big Data I care about. It's not about fanboyism. It's cool, it's real, it's rare. Arguing who invented the map reduce mechanics is like arguing that hierarchical filesystems where already there hence any progress made in that area by countless engineers is just trivial.

[0] Historical perspective: James Gleick , http://en.wikipedia.org/wiki/The_Information:_A_History,_a_T...

[1] http://dl.acm.org/citation.cfm?doid=1327452.1327492

[2] https://www.cia.gov/library/publications/the-world-factbook/...

[3] http://www.cs.columbia.edu/~sedwards/classes/2012/3827-sprin...