It is much easier to use a HOSTS file as a whitelist rather than some sort of blacklist.
HOSTS is useful but limited. For example, it does not allow for wildcards like DNS.
Unbound is included in many distributions nowadays and it has plenty of features now that can make it act like a HOSTS file or authoritative server. These work well for ad blocking.
Blocking ads is like blocking traffic using a firewall. Firewall rulesets often block everything by default and then lines are added to whitelist desired traffic. This can be easier to manage than allowing every domain by default and trying to come up with a list of all undesired domains. The same firewall-like approach has worked well for me in blocking ads. All domains blocked by default; desired domains are whitelisted.
If you use Chrome browser, it will even help you formulate your whitelist. Go to chrome://site-engagement after some routine browsing.
You might find there are some shocking entries in those massive blocking HOSTS files popular on the internet if you ever choose to read one. Sites you will never, ever visit in your lifetime online. Grossly inefficient.
It also appears sections have been cut and pasted from a variety of disparate sources without any sort of verification.
I tried to read through one of these massive HOSTS files once and had to stop as I found it too repulsive. There were far too many dark corners of the web listed that the average web user will never visit. Makes one wonder how the authors even know about these domains.
People's browsing habits are not all the same. A "one-size fits all" HOSTS file seems inappropriate.
Sounds interesting, care to elaborate a bit? How do you deal with, eg: CDNs? Whitelist *.cloudfront.net, I suppose? How often do you revisit your whitelist?
I have found I can block cloudfront domains by default with almost no inconvenience.
Occasionally something like a download link, where the webmaster has chosen to use cloudfront for that specific resource, might require that I whitelist a cloudfront domain temporarily. If the domain has a unique subdomain and I am confident no ads are ever served from that subdomain, I might whitelist it permanently.
Every user is different and visits different websites. Each user's needs are to some extent unique. I think you have to find what works for you. No one can do this for you.
The more engaged you become in blocking ads, when you stop relying 100% on a third party to try take care of it for you, I think the more familiar you become in exactly what domains you need to access to accomplish whatever it is you are doing on the web. That knowledge allows you to make yoiur whitelist.
Meanwhile anyone using Chrome can tap into the built-in diagnostics via chrome://chrome-urls to get a very quick and easy analysis of what domains they are requesting and the ones they actually need:
chrome://site-engagement
To answer the second question, if I am visiting new sites, then the whitelist is modified accordingly. Otherwise I have found the majority of IP addresses to be quite stable. If I am visiting many random websites, eventually I will find one or two that are changing their address either perirodically or permanently.
Personally I like to know if websites are changing their IP address. I think there can be good and bad reasons for changing IP address. When one is using whitelisting instead of unrestricted recursive queries to a DNS cache then it becomes easy to identify websites that are changing IP address and to monitor the changes.
TIL...While i don't spend much time in chrome's configs and settings, i liked peering into the results of my list when viewing chrome://site-engagement
I have been using this custom host file for a few months and it works like a charm. Just have to update it from time to time (but it can be automated).
"This repository consolidates several reputable hosts files, and merges them into a unified hosts file with duplicates removed. A variety of tailored hosts files are provided."
I wonder if you could circumvent the hosts method by rotating through unique subdomains as your ads server. My understanding is that you can't wildcard the hosts file.
Yes, hosts do not support wildcards. There are some solutions to blocking advertisers that use tricks like you suggest. A PiHole is able to do wildcard blocking. Also, uBlock Origin (which accepts host formatted lists) will automatically block any subdomain of a blocked root. So as long as the parent domain is also blocked, any subdomains would also be included
A good combination is uBlock Origin and Nano Defender (both correctly configured, there are steps you can follow online). uBlock Origin does a good job of blocking most stuff, and Nano Defender does a good job of stopping sites from detecting you have blocked their adverts, thus stopping the website from displaying a "Hey, you have an AdBlock, we need adverts to keep this site free. Disable your AdBlock and refresh to view this content".
Am I the only one that likes the "Hey, you have an AdBlock" popups?
They come up and I spend a few seconds deciding if it's important to me to read what is behind it, and 95% of the time that answer is "no". Saves me a TON of time. :-)
Haha. On a slight tangent, you could create a 'business' around - "The perfect home security system before you go on holiday". Enter your name, address, and last date you need it installed by for a quote.
I disabled the ad-blocker on a Liverpool Echo page because I wanted to watch the video. 421 cookies and one reboot later I was able to watch the advert before the video and then the 50 second video clip.
I presume that the 421 cookies are tracking something, only a hundred or so go to the Liverpool Echo, the others go to 20 or so other places. Nonetheless there are not many people reading local papers online, it is too much effort wading through the junk that gets downloaded. 6 megabytes to display 15 sentences and a video embed is a bit much.
In the olden days the newspapers were read by many people. Nowadays the newspaper readers are 'read' by many people. It has gone back to front.
How often does anyone here see a link to a newspaper and think to jump straight to the comments in order to see if the article is worth reading? For me this does not happen if the link is to a blog or other site likely to be sensible with the inline spam.
The sooner this ad-spam business dies off the better.
My local weekly paper has very little content, and they want £2 for it, that’s too rich.
Oddly they put every store on Twitter. And email me about it. God knows where they get the money from.
I do weep for the lack of coverage of local democracy though. Where journalism dies, political manipulation and blatant lies run rife. All we have left is private eye to cover the most egregious cases
Unless you're using a platform where you can't run an ad blocker (and I can't think of any), a hosts file (or a pihole) is a hamfisted approach compared to having ublock origin.
I used this before I switched to Pi-hole. It worked quite well in combintion hosts file + uBlock origin + uMatrix. One thing though, more and more sites now serve ads and content from the same domain, meaning if you block ads at DNS level you'll block the content too.
I run Pi-hole too, can handle much more than the hosts file of a windows computer. It was a while since I used the hosts file to block ads but at that time the computer could lock up quite a while now and then, and the problem dissappeared when I cleared the hosts file again.
It's realy neat to get autoprotection for all your devices at the same time with the Pi-hole.
Just ad uBlock to the browser to remove the rest ads and get a much smother web experience without distractions :-)
also: hosts file can cause problems with things like Windows Update and other software that you might want to keep working. Pi-hole is easier to disable. I always forget that I installed some hosts file blocks with Blackbird ( https://www.getblackbird.net/ ) which is an otherwise pretty nice tool (aside from that and how it's unclear if you're enabling or disabling something, since the switches "toggle" something instead of expressing that you want to disable or enable it specifically).
+1 for pihole; rPis / odroids / SBCs / NUCs / home servers are easy enough to run that it's worth it.
It's kind of crazy how we've been playing cat-and-mouse games between ads and ad-blockers for over a decade and yet websites still serve ads from third party domains. If they started serving ads from their own domain and randomized the IDs of elements, then they would be much harder to block.
It's true both in Windows and Linux that it's a non-routable address. It's false both in Windows and Linux that it's an invalid address. It's also false that 0.0.0.0 is "the same as 127.0.0.1" in general. That it's a valid but non-routable address makes it a good address for applications to assign a special purpose to. You'll find that in some cases 0.0.0.0 means localhost, but in other cases it has other meanings.
For example, you might be in for a nasty surprise if you assume that "nc -l 0.0.0.0 1234" is equivalent to "nc -l 127.0.0.1 1234".
By golly, Linux does map 0.0.0.0 to localhost. That produced a bunch of searches to try to find out why it does that. Nothing found. At this point I strongly suspect that Linux is simply exhibiting incorrect behaviour...
0.0.0.0 is the address programs will listen on to be able to respond to any IP address assigned to the system.
When you are setting up a socket to listen for connections on a particular port you would specify 0.0.0.0 so then things can connect from anywhere like localhost or on any of the many possible IP addresses assigned to the machine, or you can specify a particular IP address and only be able to get traffic from that. For example if you wanted a program only reachable from the same machine you could listen on localhost (127.0.0.1) and then nothing external could directly connect to that particular service.
But unlike rooting and using Adaway, all the other options on Android act as a VPN, and prevent you from using any other VPN, which makes them less handy.
I use Blockada on my phone, which runs an adblocker as a local-device VPN; a neat trick to do this without needing to root the phone (although mine is also rooted).
Unfortunately, you can only use one VPN at a time on Android. I'm not sure how you would go about blocking ads on an unrooted phone while simultaneously using a VPN. Samsung phones do have a workaround using knox, but it requires re-generating a developer key every few months and is too much trouble for most people.
Yes, and these aren't that easy to manage, but still doable, but thanks to a really helpful and big community it's easier. Some tools allow for regex, wildcards and similar.
The bigger issue comes from the likes of Google/YouTube/Facebook who host their ads on the same domain as their main website, ergo, if you want to block the ad domains, you'll be blocked the main domains as a whole. In this case, the only way to block ads is through an in-browser addon.
A PiHole could do wildcard blocking for the subdomain - but as in the ticket where the content for the site is also served from the same encrypted subdomains - nothing can be done. uBlock origin filters also fail at blocking these requests. After some research, I found a potential solution is to block off of request headers, since the ad tool is using headers as a way to send data. Unfortunately I'm unaware of any browser based tool that is able to block requests based on header content.
Its very interesting that this encrypted subdomain tool is only enabled in chrome and not Firefox. It will also detect if the developer tools are open or not. WebMD is a good example where this tool is being used.
There's a nice and maintained host file here which blackholes most ad sites: https://someonewhocares.org/hosts/. As a bonus, it blackholes some shock sites as well.
How big would the list need to get before it starts affecting performance? There is obviously some kind of lookup for every HTTP request against the hosts file. I assume the hosts file is converted into some sort of hash list?
The problem is, some (poorly written) websites don't work without the ads. Sometime you just don't care and close the tab, but sometimes you don't have a choice and in that case disabling the host file is a bit of a hassle. I prefer simple extensions like uBlock Origin which do all the work for me and that I can enable/disable as needed.
I've seen a lot of websites that don't work without scripts in my time, but never one that doesn't work without ads.
It would be possible to make one like that by hosting your content and your ads on the same domain, that would trip up naive hostfile blockers, but of course if companies were doing this quite a lot of people who habitually block ads wouldn't mind them doing so, since one of the key complaints against ads is data harvesting by third party ad providers.
I've seen websites being broken because they load some ad js, and when it fails it throws a js error which prevent the rest of the script from working. Also some websites wrap their outside urls in tracking urls, and these break as well with ad blockers.
The ones that explicitly detect adblockers and refuse to show content are usually sites that deal with more... shady material. When I need something from one of those, I find that Google's text-only cache is often enough to get the content, and if not it's really a question of how much the content is worth to me --- the back button is only a click away. What I won't do is enable JS, however; I'd sooner reverse-engineer the script and figure out how to get the content it loads than let dubious arbitrary code run.
But like I said, the back button is effortless and if your content is not rare, I'm going elsewhere.
This is all well and good until Google decides to force the use of DNS-over-HTTPS and completely bypasses the host operating system. Browsers have also done this for certificate trust lists. This takes more and more power away from the users.
One downside to this approach is that you still see where the banner was with an "address not found" block. I switched to uBlock origin some time ago which I prefer as 1) it collapses the ad blocks so you never realise they were there, and 2) it auto-updates the block lists for you.
I would say, that rather is an advantage, not a disadvantage. It is good to know after all that /something/ happened so you know that a page might be broken in some way instead of it failing silently behind your back.
That's true if you're prepared to update the file yourself all the time. In my experience there are a lot of URLs to maintain and I am ok with offloading that trust to a 3rd party like Easylist who will maintain the list for me.
Admittedly, I do occasionally have to turn the adblocker off to get a site to work, but this is maybe once a month.
This is the reason I haven't installed Pi-hole. I understand that a broken site may be because of the adblocker and can turn it off in my browser, but a less tech savvy user may not know this. And if they are on a Pi-hole network they won't know or be able to turn it off (I understand there is a whitelist but I believe this is only configured by an admin - could be wrong here).
I would imagine very little, and as far as I can tell the resolver is called first and then a lookup is done in /etc/hosts, the hosts-file takes precedence.
I too use Steven Black's hosts file. I can tell when I forget to implement it by the sound of my cpu fan. That said I'm fighting with one big limitation, and that's the fact that I do understand that some sites are ad supported and I'd like to support those sites. I wish there was a way with the hosts method to enable ads for just those sites without also enabling all the tracking that goes with it.
Yes sadly ads and tracking have become the same. While I certainly don't enjoy ads and have reservations about the ethics of ads altogether - I'm 100% totally against tracking, profiling, and targeting. I allow ads on DuckDuckGo since they are related to what I'm actively searching - but other then that I block all ads since I know they are also tracking me.
If you block using a hosts file, the ads will be requested from an IP address, thus skipping a DNS lookup.
If certain IP addresses start getting blocked, they'll move to IPv6 and have an infinite dynamic supply, which are randomly picked as the web page is served.
It is an arms race.
Also: advertising ruins every medium it ever touches. There is no self policing or sense of restraint or any line that could be crossed leading to a feeling of shame.
It is an arms race but its worse that the progression you listed. Since the easy solution to ipv6 hosted ads is to just block ipv6 (plus anyone without ipv6 wouldn't see the ads), they just randomly generate div elements with random names making it impossible to distinquish the ads from real content.
When I see things like this I think of the average user who may not be technically savy enough to implement this type of configuration. If Google blocks things like uBlock origin and ABP, how can we help less tech-savy users have an ad-free experience?
>how can we help less tech-savy users have an ad-free experience
I'd advocate for encouraging adoption of more flexible browsers, if it comes to that. Firefox is far from perfect (see Looking Glass), but for less tech-savvy users it's probably the best option (and you'll get better results than a hosts file or DNS server since the ad blocker can actually fix the page layout and modify elements, too!).
I have this running on the VPS I run a a SOCKS proxy on and surf through. Works great.
It really speeds up sites with annoying ads or ads with scripts that hang.
The annoying thing is that it blocks direct links on deal sites etc.
I've had to give up on both which is a pain for me. Reason being that for some reason no connections work when it's running. As soon as I disable it internet works again. Haven't bothered figuring out what the issue is yet.
Yeah, totally. Set myself a Pi-hole + PiVPN up on a small Vultr instance. Routing all Smartphone traffic through it. Costs me 3.50 € a month and blocks half of all DNS requests. Haven't seen a single external ad since then.
HOSTS is useful but limited. For example, it does not allow for wildcards like DNS.
Unbound is included in many distributions nowadays and it has plenty of features now that can make it act like a HOSTS file or authoritative server. These work well for ad blocking.
Blocking ads is like blocking traffic using a firewall. Firewall rulesets often block everything by default and then lines are added to whitelist desired traffic. This can be easier to manage than allowing every domain by default and trying to come up with a list of all undesired domains. The same firewall-like approach has worked well for me in blocking ads. All domains blocked by default; desired domains are whitelisted.
If you use Chrome browser, it will even help you formulate your whitelist. Go to chrome://site-engagement after some routine browsing.
You might find there are some shocking entries in those massive blocking HOSTS files popular on the internet if you ever choose to read one. Sites you will never, ever visit in your lifetime online. Grossly inefficient.
It also appears sections have been cut and pasted from a variety of disparate sources without any sort of verification.
I tried to read through one of these massive HOSTS files once and had to stop as I found it too repulsive. There were far too many dark corners of the web listed that the average web user will never visit. Makes one wonder how the authors even know about these domains.
People's browsing habits are not all the same. A "one-size fits all" HOSTS file seems inappropriate.