3taps vs. Craigslist -- Who owns public data?

tptacek · on Sept 21, 2012

It's not "public data". Illustrating example: people can and do make different offers on Craigslist than they do in other channels.

pjlegato · on Sept 21, 2012

Thing is, factual data itself, such as "Someone is offering to the public to sell X for $Y," is not subject to copyright. Copyright-eligible works must include some element of creativity, however small. (https://en.wikipedia.org/wiki/Feist_v._Rural)

So the text of a classified ad itself is indeed copyrightable, but the mere fact that a house at 123 Some Avenue is for rent for $1,000 per month is not. Having learned of that fact from a copyrighted work, and in the absence of an NDA or something similar, you are free to disseminate that fact as you like.

It will be very interesting to see how this plays out.

tptacek · on Sept 22, 2012

That might be true, but it doesn't follow that you can lawfully scrape facts out of copyrighted content on someone else's website.

tptacek · on Sept 22, 2012

'aklofas auto-killed question:

I do believe that is exactly what search engines do. Are they breaking copyright?

This is not a simple question. Huge court cases have been fought over this. In general, things that get Google off the hook in these cases:

* Publishers have the ability to opt out of Google

* Where Google creates copies of information from other sites, those copies are provided to uses noncommercially (ie, they don't make more money when you use their cache).

* Google uses DMCA Safe Harbor to avoid liability, which again turns in part on Google honoring opt-out requests from publishers.

* Google's use of the data is transformative, an idea that in part turns on it not being a direct substitute for the original.

These are not generally arguments that bode well for PadMapper, which is effectively trying to compete with Craigslist using Craigslist data and a better interface. Publishers generally want Google to do things differently... but when push comes to shove, they also really want to be in Google's index. The same is not true for PadMapper.

benmccann · on Sept 22, 2012

How is this different than Feist v. Rural in your mind? A fact is not copyrightable and is scrapable according to that case. I don't see how having copyrighted content next to non-copyrighted content affords any protection to the non-copyrighted content.

tptacek · on Sept 22, 2012

First, Feist says nothing about content being "scrapable". It's a 1991 case. To pull content off Craigslist against their will, you have to cross the CFAA.

Second, phone numbers are raw facts, but advertisements are not; every advertisement ever has been copyrighted, and a whole 11-figure industry depends on that.

rwwmike · on Sept 22, 2012

The distinction 3taps makes is that it doesn't scrape content from Craigslist but from search results from google and bing.

elsewhen · on Sept 22, 2012

about a week or two after Craigslist filed the lawsuit against 3taps, they put a "noarchive" tag on their listings pages. Since then, their content isnt available in search engine caches.

_3u10 · on Sept 21, 2012

Oh, I like this idea....

Neural net for CL postings, extract the facts, reformat for your own display...

001sky · on Sept 22, 2012

The key trick is the fact seperation. The information is in the public domain. So, you have to be a "reporter" rather than a "republisher" if that makes sense. The raw data may or may not be subject to certain considerations, the the only really valuable part -- the fact/information -- is more or less urestricted.

The question is, will CL now take steps to make the data more "private" (eg, member only, even if free...etc) or will they take steps to re-introduce themselves into a critical step of actionable use (must login for contact details, etc). CL could start to look like an apartment broker, though, if they follow through with this latter approach.

So this is an interesting dynamic to watch how it plays out. It might aso still be interesting to use a-padmapper-like-service, even as a complement to CL service. If I just had to get 1-5 things its no problem. But sorting 200-400? to find 5 is a PITA, because wading CL is increasingly inneficient. Its a discovery issue.

welder · on Sept 22, 2012

I made a-padmapper-like-service called CLMapper here:

https://chrome.google.com/webstore/detail/omonmigaleaafgpkgo...

http://techcrunch.com/2012/08/01/clmapper-is-a-padmapper-alt...

Cushman · on Sept 21, 2012

That doesn't mean there is copyright protection for doing so.

anigbrowl · on Sept 22, 2012

I think Tom is right. There's a famous case (http://en.wikipedia.org/wiki/Feist_v._Rural) where a telephone company sued someone for copying subscribers' telephone #s in a competing directory, and the Supreme Court ruled taht you can't copyright facts - essentially the same claim that 3taps is making on the copyright side. But Rural telephone co. had a statutory monopoly and the compilation and publication of the directory for subscribers' benefit was a condition of that monopoly.

Now CL has a de facto monopoly, but it's like many others in that the market has granted that status to a large extent. CL can afford to look frumpy because it has few competitors and a massive first-mover advantage. You could set up 'Cushman's list' tomorrow and you'd probably crash and burn without them lifting a finger to obstruct you. So CL's listings are more than mere facts, they're the expression of a commercial preference by advertisers. A better comparison owuld be with stock exchange data; (as far as I know) the copyright on that is watertight because it's partly an expression of member companies' desire to be listed on that exchange as opposed to one of the competing exchanges.

The above is just my hunch about the copyright claim, but I don't think 3taps can succeed with that argument. The antitrust claim, I have no idea - but it should be borne in mind that monopolies are not necessarily bad. Courts nowadays give great weight to consumer benefit rather than abstract rules, and CL delivers an awful lot of consumer benefit by being free for most and charging very modest fees to a small class of advertisers.

rwwmike · on Sept 22, 2012

The whitepages case is exactly the one EFF's attorney Kurt Opsahl referred to when I spoke to him about copyright law and facts.

willrobinson · on Sept 21, 2012

The more Sherman Act claims that are brought against web incumbents, the sooner one is going to stick.

The CL blog stinks of entitlement.

Craigslist "stole" the classifieds business from local newspapers. Now they are accusing others of trying to "steal" the "CL idea" from them.

I am one of (probably) the few who actually prefer a CL type interface. But the way CL is behaving in the face of competition is just embarassing. If someone wants to reshape the data, then you have to let them. The data is not the property of CL. If users do not want their data on some other site, then that's their issue to raise, not CL's.

_3u10 · on Sept 21, 2012

Craigslist has never sued anyone for reimplementing their 'idea'.

Craigslist didn't 'steal' the classifieds business, I recall no point in time which I could find the classifieds in my news paper copied verbatim onto craigslist.

Craigslist offered a solution which was superior to newspapers and built a business around it.

As to who holds copyright on the data that's a question for the courts that is currently undecided, if it was cut and dried as to who held copyright on the the data then summary judgement would have already been filed.

I don't think craigslist holds exclusive copyright on the data so in my mind they may lack standing as whether 3taps is allowed to use the data becomes an issue between 3taps and millions of other users, perhaps a class action suit is more appropriate.

willrobinson · on Sept 22, 2012

"stole" and "steal" and "CL idea" are all in quotes for a reason. Here, quotes are intended to signify the words quoted do not necessarily carry their dictionary meanings. They carry whatever meaning you assign to them. And that is what you have done. To you, "steal" means verbatim copying. But I might have assigned a different meaning, or maybe the same one. It's a figure of speech.

As for summary judgment, I think you mean _granting_ of summary judgment, not _filing_. But I'm not going to split hairs on the words you used. I know what you meant, even if it wasn't technically correct.

ohashi · on Sept 22, 2012

If we're going to have a serious discussion, let's not put quotes around things and have different interpretations. Let's state exactly what we mean and talk about it.

anigbrowl · on Sept 22, 2012

You'd need to show that CL copied ads from newspapers in order to attract traffic, and I don't believe that's the case. taking away market share and scraping someone's website for content are vastly different things.

einhverfr · on Sept 22, 2012

But when giving away an exclusive license, as CL requires, you aren't allowed to run the same content in both the newspaper and CL, right?

I have always wondered about running a similar listing somewhere else first, then running something lightly edited on Craigslist, sending their registered agent, by registered mail, a note that the exclusive license applies only to the relatively minor editorial changes applied.... I wonder how fast such would get delisted....

willrobinson · on Sept 22, 2012

Do you think most people listing ads on CL read the terms and understand them as you did? (Or were the terms confusing?) Are CL's terms different from what one would normally expect from a newspaper? That is, would you expect that the newspaper would require an exclusive license and prohibit you from running your ad anywhere else?

einhverfr · on Sept 22, 2012

Yes, they are. Normally if someone wants an exclusive right to content they pay the producer for them. Virtually everyone else asks for a non-exclusive license. This is very different and has been discussed here on HN before.

willrobinson · on Sept 22, 2012

And didn't CL change their terms (excl-->nonexcl) after some blogger posted about them? And didn't they make some changes to their site (collaborate with a maps provider so users can now get geo mappings) after filing this lawsuit? I've already forgotten now. This case just seems laughable to me. But what do I know.

willrobinson · on Sept 22, 2012

"stole" market share. You got it. "[S]tole" was just a figure of speech. And that is in fact what meant by stole.

re: scraping. This is something that has come before the courts a few times (I'm thinking Ebay and a few others; although it might have been called "crawling"). Do you think CL can win on a claim of "scraping"?

tptacek · on Sept 22, 2012

Yes, they probably can. Read those actual cases. In particular, in the cases where the scraper/crawler won, read why they won.

willrobinson · on Sept 22, 2012

Have you read the CL complaint?

001sky · on Sept 21, 2012

3taps will file an antitrust countersuit, alleging that Craigslist maintains a [monopolistic control] over numerous markets related to online classified advertising. If successful, 3taps could open up the market to numerous innovations atop Craigslist data and bring about the user interface and search features, whose lacking seem to be exemplified by the popularity of sites such as PadMapper

-- Anti trust is an interesting angle.

The Irony is that newspapers (the competition) were all reliant on classified ads, which have been decimated by craigslist. As it is, Craigslist is just a good competitor with a more efficient system and better pricing. There are other choices, just at different pricepoints. Its not clear this is a winning formula.

It will be interesting to see if there are other angles of attack -- such as fair use -- which would also allow re-purposing of data posted to quasi-public internet places, that hold non-exclusive rights on said information.

ChuckMcM · on Sept 21, 2012

Sigh, antitrust is a dead end. It might have been possible to argue prior restraint if Craigslist had stuck with their silly 'we own this' clause but they didn't.

My guess is that any judge would say "Can anyone else make a classifieds site?" and "Does Craigslist interfere with anyone on their site using a different site?". Since the answer will be 'No' and 'No' thus they don't have a monopoly.

SoftwareMaven · on Sept 22, 2012

That seems overly simplistic. Anybody could have used a different browser/OS (there were several available in the 90s), yet MS was still found to be violating antitrust laws.

The real question is whether Craigslist is using its market dominance in a way that hurts consumers. It's a far more nuanced question and one that will be fascinating to watch get answered. Craigslist did some awesome things for consumers once upon a time. Now? I'm not so sure it's all positive.

001sky · on Sept 22, 2012

Your counter-example (ms windows+expolorer) is not-quite on point, though. Consider: windows had a legitimate monopoly, and then they "abused" it. But, because you are presuming the existence of a monopoly for CL this does not work. CL is an online classified ads business. The barriers to entry are de-minimus (unlike, OS software + hw compatibility, ect). So the premise of monopoly can be attacked fairly and directly.

CL arguably is not abusing a monoploy because they don't even have one. They just have a cheap service, and nobody wants to pay to do it different. The consideration you raise -- are they anti consumer -- is an interesting one to think through. It does seem they over-reached (re: the exclusivity of data). But without any contractual exclusivity or confidentiality its not clear that CL can "monopolize" publicly published data, given that the most valuable bits are (at least for rentals) nothing but street adresses, apartment #s, and possibly Telco #s. These data are not legal-sense "intellectual property" under most concepts (patent, copyright, or trade secret).

It would be an interesting argument to see the threshold case: when/where do 3 facts become a creative work? Its not a trivial argument, though, because at some stage N words becomes a lyric a poem or an idea. You cant copyright a word, though. Its highly unlikely you can copyright an adress. They might try to argue its like a "recipe" or some such (list of common ingredients?). Who knows. But somewhere in there, I'm guessing we'll find out.

ChuckMcM · on Sept 22, 2012

Well if you recall Microsoft took an additional step, not only did they bundle their browser with windows, they put language in their contracts with OEMs that forbade them from putting other browsers into the system. Had Craigslist stuck with their idea that you couldn't put your listing anywhere else, then yes you could have argued they were using their position to restrict trade, but they stopped doing that fairly quickly (according to the article after talking with the EFF).

einhverfr · on Sept 22, 2012

It's a good angle. It raises abuse of copyright issues, and means Craigslist, if they lose, will have its use of copyright subject to greater scrutiny.

willrobinson · on Sept 22, 2012

If data is deliberately made available to the public, then it's public data. Even if's made available for free, that won't stop some on the web from trying to repackage it and derive commercial benefit from it. We've seen this many times. These folks have been very lucky that the validity of their assumptions has not been thoroughly tested in court. Somehow they become convinced they own the data. In reality they are merely the distributor.

Even if CL wins, e.g., they are granted an injunction to stop another site from scraping, scrapers can just get the same data from search engine caches. It's hard to argue trespass to chattels when the alleged trespasser never touches your servers. Moreover, search engines are themselves scrapers so clearly scraping is not per se a damaging activity in CL's view, only when it suits CL to view it that way. Arguing that robots.txt is a "license" is a stretch. It's designed to be read by a machine not a human.

And what if CL loses? What are the stakes then? Well, I'll let you answer that one. What exactly does 3taps have to lose?

CL claims they own the copyrights to facts and descriptions uploaded by CL users. Are users aware of this? Is it reasonable?

3taps' Answer should be a fun read.

buro9 · on Sept 22, 2012

Does the USA not have a database right?

http://www.caret.cam.ac.uk/copyright/Page92.html

Effectively acknowledging that the contents of a database may have varying copyright (owner, public works, facts, etc), but that the database itself is given protection implicitly if the database is "original and the result of substantial investment".

This is why you find fake data in Google Maps, the Rare Record Price Guide http://www.amazon.co.uk/product-reviews/0953260194 and so on. Because they are representations of databases and all the companies behind them have to do to have the protection is to prove that other representations of the data have been sourced from their database, which they do by pointing at the secret fake data that is part of the database.

Does that not exist? It offers protection to any entity that has compiled an original source of data at their expense.

einhverfr · on Sept 22, 2012

The US does not recognize copyrights over databases. The issue though is that depending on the ads in question there may be creative elements.

For example, suppose I place an ad to rent my house out and include just info. Not likely protected but if I do it in iambic pentameter, probably is protected.

What this wouldn't prevent is someone scraping CL for facts (appartments for rent, x bed, y bath, z sq ft), extracting those facts, and arranging them in another order. That doesn't strike me as protectable in the US, even if it involves scraping directly from CL.

einhverfr · on Sept 22, 2012

I think we should create an app called clscraper which does as follows:

Given a set of resources, crawls across the site, extracting specific information from listings (price, number of bedrooms, number of bathrooms, square ft etc, contact info). Puts them in a very simple database. This should be 100% trouble-free copyright-wise at least in the US. The larger issue becomes what happens under other laws. However building a generic tool to extract (non-copyright-worthy!) facts from ads should by itself be more or less trouble-free.

Make this generic enough to work on most ad sites out there. Push the boundary back.

Diamons · on Sept 22, 2012

Forgive me if I'm mistaken, but this is how I see this whole ordeal.

Craigslist, a website that built up its user base and functionality by itself (lots of persistent hard work over the years) has listings.

3TAPS believes this information is public and should be freely accessible by anyone.

How does that make any sense at all? If I work for years to build a website, I will use and lease MY damn data however I please. A 3rd party has no right to claim that the website I have created that people use is "public property". I built the house. Now you're going to tell me it's a shelter?

einhverfr · on Sept 22, 2012

Those who doubted that an exclusive license was truly exclusive should take note.

Craigslist here is arguing that they own essentially full copyright on all listings. This means if you advertise on Craigslist and you submit the same listing somewhere else you are guilty of breach of contract, and the other party may be guilty of copyright violation. How long before users start getting sued by one party to such litigation?

This is big trouble. We need to highlight that you give away all rights to all issue listings including the same material elsewhere.