It's from this tos page: https://www.cloudflare.com/terms/ *2.8 Limitation on Se...

ignoramous · on Oct 19, 2022

This limitation apparently doesn't apply to R2 / Workers [0].

May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).

[0] https://news.ycombinator.com/item?id=20791660

[1] https://news.ycombinator.com/item?id=30034547

[2] https://nitter.net/rethinkdns/status/1546232186554417152

tomschwiha · on Oct 19, 2022

Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.

What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.

lcnPylGDnU4H9OF · on Oct 19, 2022

I'd agree that's weird. Seems like if it were simply renamed to .html with no content changes, then it would be okay.

> What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.

My understanding has this as moot. The issue from Cloudflare's perspective is only that the content is non-HTML and doesn't have anything to do with the rate of traffic (the abuse).

bombcar · on Oct 19, 2022

> (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8.

The key is "as viewed through a web browser" imo, this is not really an API and it's not a webpage; it's a datafile and would fall into R2 or similar things.

Spunkie · on Oct 19, 2022

Why do people keep talking like you can't just navigate to a txt file in your browser and have it serve as any old web content? Which is something I have actually done many years ago to search for a domain in these types of lists.

Cloudflare is balancing on a razer for this TOS technicality.

r3trohack3r · on Oct 19, 2022

The TOS aren’t referring to content-type headers, magic bytes, TCP headers, browser support of file formats, or any technical implementation.

To oversimplify, they’re saying Cloudflare’s service is to be used for serving websites to browsers.

Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms. I’m not sure why you’re trying to “gotcha” through their ToS.

It would be great if Cloudflare would donate resources to EasyList - it would do a lot to help the free and open internet by giving users more power over what gets delivered to their browser. But call that what it is: a donation.

LawTalkingGuy · on Oct 20, 2022

> I’m not sure why you’re trying to “gotcha” through their ToS.

People are doing the opposite, pointing out the hole and asking them to get a better rule. Surely they don't just want the list merely converted into html.

> They give that away for free [...]

So they should specify things that influence cost such as total bytes served, number of files, etc. Currently all you can do it bypass the rule because you don't know how to cooperate.

bombcar · on Oct 19, 2022

It's lawyer speak, but the meaning is clear "this Cloudflare service is for webpages in a browser, not automated data downloads and distribution".

lcnPylGDnU4H9OF · on Oct 19, 2022

I see, that makes the position more understandable. I guess the same rule would (should) apply if they did indeed simply change the extension.

kenmacd · on Oct 20, 2022

> Seems like if it were simply renamed to .html with no content changes, then it would be okay.

Imagine you do that and I DDoS the URL. CF will then mitigate this DDoS by, in part, replacing your html with their Browser Integrity Check html.

If you're serving 'web pages and websites' everything continues to work. What would happen if this list suddenly became an actual webpage.

If your site is serving 'a disproportionate percentage' of non-html you decrease the ability of CF to tell good traffic from bad.

LinAGKar · on Oct 19, 2022

A filter list is definitely not HTML

Macha · on Oct 19, 2022

The minimal spec valid HTML5 document is currently:

    <!DOCTYPE html>
    <title>a</title>

Practically, browsers will accept omitting both of these, and the spec even allows for omitting the title "if it is provided by a higher level protocol"

So it's not that crazy an argument that a plain text file is a html document

pests · on Oct 20, 2022

Too technical.

They serve websites to browsers for people to view. This file (be it properly formatted .html or .txt) is not a website people go to in their browser - its used internally by an application. This is the key point.

kenmacd · on Oct 20, 2022

You're looking at it backwards though. CF doesn't _actually_ care about what the content is, only that they can apply their DDoS protections to it. If you're serving a text file that's much more difficult as they can't replace it with their own content.

LinAGKar · on Oct 27, 2022

Only because they've so comprehensively defined HTML parsing that even parsing random data has a well-defined result.

briffle · on Oct 19, 2022

They host the zipped files of content for haveIbeenPwned for Troy Hunt...

sp332 · on Oct 19, 2022

That's a special project they decided to take on, not subject to the standard ToS.

matheusmoreira · on Oct 20, 2022

They should put EasyLisy in that special project category. It's just too important to the internet.

spatley · on Oct 19, 2022

My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.

text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.

ignoramous · on Oct 19, 2022

> My best guess is...

Some think its very Oracle of Cloudflare to do so. I do not blame them.

Slix · on Oct 19, 2022

This doesn't sound right to me. Cloudflare also protects web APIs. This text file is an extremely simple web API, but it is still a web API.

tyingq · on Oct 19, 2022

If the web apis were a disproportionate amount of what was served for some customers specific free CF plan, as compared to the cached HTML, then that doesn't match their TOS.

bornfreddy · on Oct 19, 2022

Sounds like it is meant to deal with multimedia mostly?

But anyway, just rename .txt to .html and you're done.

tyingq · on Oct 19, 2022

I imagine that might help with automated tos rate limiting, but eventually someone at Cloudflare will probably cut them off. It's plain text, but it's basically serving a distributed database. And a hint at their scale is "100TB of “Access Denied” served up monthly.

Cloudflare just seems to be trying to limit the free tier to "caching website html for the purpose of showing it to humans". They have pricing and plans for things other than that.

Maxburn · on Oct 19, 2022

Simple but will it will break all sorts of automation down the line? All the other adlists are txt and I don't know how they would handle other file types, even if the content is unchanged.

PaulDavisThe1st · on Oct 19, 2022

Determining file type from the file name suffix is a fool's game and always was.

Maxburn · on Oct 19, 2022

I hear you there. I'm more thinking someone probably hard coded txt file extension somewhere so something is likely to fall apart in simply handling that file.

ethbr0 · on Oct 19, 2022

Is it? Seems superior to arbitrary magic numbers or headers, and God forbid full naive parsing, in most ways.

mannykannot · on Oct 19, 2022

I doubt there is any solution that is both robust and simple. In a sense, it is the same problem as that which ad blockers are attempting to solve.

semi · on Oct 20, 2022

Whats wrong with storing and delivering the intended content type as metadata, whether thats headers or filesystem metadata like in Mac OS X?

ethbr0 · on Oct 20, 2022

Transmitting in-band (headers) seems ripe for arbitrary complexity. Someone out there would write a Turing-complete header DSL. And then someone else would write an incompatible alternative implementation.

At least file extension is limited and externally visible (and thus accountable) to third party behavior, which should limit the worst complexity excesses.

Is filesystem metadata actually different (theoretically) from extension? Or just data in a different format?

Extension seems a nice balance between simplicity / brevity and utility, albeit as a hint, not a commandment.

tomschwiha · on Oct 19, 2022

Fun stuff like embedding data into jpgs or pngs.

kenmacd · on Oct 20, 2022

Then CF replaces the html with their Browser Integrity Check. How does the app deal with the list becoming real 'Checking your browser" html?

tomrod · on Oct 19, 2022

If I can read it in Lynx, it is web content.

arendtio · on Oct 21, 2022

From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.

I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.

layer8 · on Oct 19, 2022

The solution seems simple, just wrap it in a trivial HTML envelope. Enclose it in <pre> tags if needed.