Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's from this tos page: https://www.cloudflare.com/terms/

2.8 Limitation on Serving Non-HTML Content

...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...

A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.



This limitation apparently doesn't apply to R2 / Workers [0].

May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).

[0] https://news.ycombinator.com/item?id=20791660

[1] https://news.ycombinator.com/item?id=30034547

[2] https://nitter.net/rethinkdns/status/1546232186554417152


Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.

What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.


I'd agree that's weird. Seems like if it were simply renamed to .html with no content changes, then it would be okay.

> What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.

My understanding has this as moot. The issue from Cloudflare's perspective is only that the content is non-HTML and doesn't have anything to do with the rate of traffic (the abuse).


> (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8.

The key is "as viewed through a web browser" imo, this is not really an API and it's not a webpage; it's a datafile and would fall into R2 or similar things.


Why do people keep talking like you can't just navigate to a txt file in your browser and have it serve as any old web content? Which is something I have actually done many years ago to search for a domain in these types of lists.

Cloudflare is balancing on a razer for this TOS technicality.


The TOS aren’t referring to content-type headers, magic bytes, TCP headers, browser support of file formats, or any technical implementation.

To oversimplify, they’re saying Cloudflare’s service is to be used for serving websites to browsers.

Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms. I’m not sure why you’re trying to “gotcha” through their ToS.

It would be great if Cloudflare would donate resources to EasyList - it would do a lot to help the free and open internet by giving users more power over what gets delivered to their browser. But call that what it is: a donation.


> I’m not sure why you’re trying to “gotcha” through their ToS.

People are doing the opposite, pointing out the hole and asking them to get a better rule. Surely they don't just want the list merely converted into html.

> They give that away for free [...]

So they should specify things that influence cost such as total bytes served, number of files, etc. Currently all you can do it bypass the rule because you don't know how to cooperate.


It's lawyer speak, but the meaning is clear "this Cloudflare service is for webpages in a browser, not automated data downloads and distribution".


I see, that makes the position more understandable. I guess the same rule would (should) apply if they did indeed simply change the extension.


> Seems like if it were simply renamed to .html with no content changes, then it would be okay.

Imagine you do that and I DDoS the URL. CF will then mitigate this DDoS by, in part, replacing your html with their Browser Integrity Check html.

If you're serving 'web pages and websites' everything continues to work. What would happen if this list suddenly became an actual webpage.

If your site is serving 'a disproportionate percentage' of non-html you decrease the ability of CF to tell good traffic from bad.


A filter list is definitely not HTML


The minimal spec valid HTML5 document is currently:

    <!DOCTYPE html>
    <title>a</title>
Practically, browsers will accept omitting both of these, and the spec even allows for omitting the title "if it is provided by a higher level protocol"

So it's not that crazy an argument that a plain text file is a html document


Too technical.

They serve websites to browsers for people to view. This file (be it properly formatted .html or .txt) is not a website people go to in their browser - its used internally by an application. This is the key point.


You're looking at it backwards though. CF doesn't _actually_ care about what the content is, only that they can apply their DDoS protections to it. If you're serving a text file that's much more difficult as they can't replace it with their own content.


Only because they've so comprehensively defined HTML parsing that even parsing random data has a well-defined result.


They host the zipped files of content for haveIbeenPwned for Troy Hunt...


That's a special project they decided to take on, not subject to the standard ToS.


They should put EasyLisy in that special project category. It's just too important to the internet.


My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.

text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.


> My best guess is...

Some think its very Oracle of Cloudflare to do so. I do not blame them.


This doesn't sound right to me. Cloudflare also protects web APIs. This text file is an extremely simple web API, but it is still a web API.


If the web apis were a disproportionate amount of what was served for some customers specific free CF plan, as compared to the cached HTML, then that doesn't match their TOS.


Sounds like it is meant to deal with multimedia mostly?

But anyway, just rename .txt to .html and you're done.


I imagine that might help with automated tos rate limiting, but eventually someone at Cloudflare will probably cut them off. It's plain text, but it's basically serving a distributed database. And a hint at their scale is "100TB of “Access Denied” served up monthly.

Cloudflare just seems to be trying to limit the free tier to "caching website html for the purpose of showing it to humans". They have pricing and plans for things other than that.


Simple but will it will break all sorts of automation down the line? All the other adlists are txt and I don't know how they would handle other file types, even if the content is unchanged.


Determining file type from the file name suffix is a fool's game and always was.


I hear you there. I'm more thinking someone probably hard coded txt file extension somewhere so something is likely to fall apart in simply handling that file.


Is it? Seems superior to arbitrary magic numbers or headers, and God forbid full naive parsing, in most ways.


I doubt there is any solution that is both robust and simple. In a sense, it is the same problem as that which ad blockers are attempting to solve.


Whats wrong with storing and delivering the intended content type as metadata, whether thats headers or filesystem metadata like in Mac OS X?


Transmitting in-band (headers) seems ripe for arbitrary complexity. Someone out there would write a Turing-complete header DSL. And then someone else would write an incompatible alternative implementation.

At least file extension is limited and externally visible (and thus accountable) to third party behavior, which should limit the worst complexity excesses.

Is filesystem metadata actually different (theoretically) from extension? Or just data in a different format?

Extension seems a nice balance between simplicity / brevity and utility, albeit as a hint, not a commandment.


Fun stuff like embedding data into jpgs or pngs.


Then CF replaces the html with their Browser Integrity Check. How does the app deal with the list becoming real 'Checking your browser" html?


If I can read it in Lynx, it is web content.


From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.

I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.


The solution seems simple, just wrap it in a trivial HTML envelope. Enclose it in <pre> tags if needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: