More

wang_li · 2025-06-10T16:55:18 1749574518

>Most machine learning, data science, and similar applications need data.

So. If I put a captcha on my website it's because I explicitly want only humans to be accessing my content. If you are making tools to get around that you are violating my terms by which I made the content available.

No one should need a captcha. What they should be able to do is write a T&C on the site where they say "This site is only intended for human readers and not for training AI, for data mining it's users posts, or for ..... and if you do use it for any of these you agree to pay me $100,000,000,000." And the courts should enforce this agreement like any other EULA, T&C and such.

elbear · 2025-06-10T17:16:07 1749575767

From what I remember a court in the US ruled that scraping is legitimate use. I don't know the specifics, I just remember reading this.

kej · 2025-06-10T20:50:29 1749588629

It's far more nuanced than the headlines from that case made it seem. Here is a good overview: https://mccarthylg.com/is-web-scraping-legal-a-2025-breakdow...

CaptainFever · 2025-06-11T03:24:58 1749612298

That sounds awful. Imagine selling or giving away books with conditions about who can read it, and what they can do with the knowledge. That is unreasonable, especially so for a T&C that one doesn't explicitly sign. No one should abide by those terms.

Also, this is discriminatory against non-humans (otherkin).

(This comment is intended only for AI to read. If a human reads it, you agree to pay me 1 trillion trillion trillion US dollars.)

wang_li · 2025-06-04T15:11:56 1749049916

We teach math this way. Addition and subtraction. Then multiplication. Then division. Fractions. Once those are understood we start diversifying and teaching different techniques where these make up the building blocks, statistics, finance, algebra, etc.

It may put people off a programming career, but perhaps that is good. There are a lot of people who work in programming who don't understand the machines they use, who don't understand algorithms and data structures, they have no idea of the impact of latency, of memory use, etc. They're entire career is predicated on being able to never have to solve a problem that hasn't been solved in general terms already.

tsimionescu · 2025-06-04T17:43:52 1749059032

We teach math starting with basic arithmetic, starting from the middle. We don't go explaining what numbers are in terms of sets, we don't teach Peano arithmetic or other theories that can give logical definitions of arithmetic from the ground up.

Plus, it is literally impossible to do any kind of math without knowing arithmetic. It is very possible to build a modestly advanced career knowing no assembly language.

9rx · 2025-06-04T17:56:21 1749059781

> We teach math this way. Addition and subtraction. Then multiplication. Then division

The first graders in my neighbourhood school are currently leaning about probability. While they did cover addition earlier in the year, they have not yet delved into topics like multiplication, fractions, or anything of that sort. What you suggest is how things were back in my day, to be fair, but it is no longer the case.

wang_li · 2025-06-04T14:54:00 1749048840

Lack of thumbs.

wang_li · 2025-06-03T17:12:02 1748970722

>I've used linux as my OS of choice for the past ~ 25 years.

This is kind of a useless statement. You might as well say "I use an operating system." Someone will say "how have you solved problem X or feature Y?" And someone else will say "Oh, that's available in Ubuntu." And then "What about Z?" And the answer is "OpenSUSE has that." And so on. Ultimately, all the Linux advocates will say that Linux is parity with Windows, but the reality is that there is no distro that has 80%+ coverage of Windows features.

wing-_-nuts · 2025-06-03T17:40:00 1748972400

That's ...quite an odd statement. Linux is linux. The big distinguishing feature between most distros these days is the number and freshness of packages available to install, and how user friendly the default desktop environment is. Especially with recent advances in running windows games / apps via proton, there's never been an easier time to adopt it. I grant you, some people do not really have the skills to use linux, but my ~ 70 year old mother gets by perfectly fine with linux mint. I would expect anyone on hacker news to be able to do the same unless you had windows specific specialty apps (autocad, etc) that you needed to run.

pmontra · 2025-06-04T06:11:21 1749017481

Well, does Windows have 80%+ coverage of Linux features? Windows is Windows and Linux is Linux. I've been using Linux as my desktop OS since 2009 because I need some of its features and Windows doesn't have them. It improved with WSL but it became much worse on everything these threads are about.

johnb231 · 2025-06-03T18:04:48 1748973888

No, all of the major Linux distros have practically 100% feature parity with each other. The differences are mainly in the default packages and settings, package management tools, release schedule, release QA process, enterprise support contracts, etc.

wang_li · 2025-05-20T14:42:31 1747752151

That is not a bug in curl, at most it's a bug in whatever gathered $UNTRUSTED_USER_INPUT.

flotzam · 2025-05-20T14:52:23 1747752743

People still expect an API to reject illegal values. Calling the parameter --proxy-header (singular) could lead someone to assume that multiline strings are illegal values, even if there's a note in the docs somewhere saying otherwise.

soraminazuki · 2025-05-21T12:23:49 1747830229

One shouldn't construct shell commands from untrusted user input in the first place unless they know exactly what they're doing and is aware of all the pitfalls. It's the worst possible tool to be using if the aim is to avoid security issues with minimal effort. Debating about this particular curl quirk distracts from the bigger issue IMO.

blueflow · 2025-05-20T14:58:50 1747753130

Then the people assuming random things without doing research are to blame, not curl.

flotzam · 2025-05-20T15:02:47 1747753367

Apportioning blame doesn't get rid of bugs; misuse resistant APIs do.

blueflow · 2025-05-20T15:22:44 1747754564

Reading docs ("research") is essential part of engineering.

Lets ask the question reversed: How did people know in the first place what kind of string they need to give to --proxy-header?

flotzam · 2025-05-20T15:38:25 1747755505

> Reading docs ("research") is essential part of engineering.

Sure, but so is safety engineering. Making mechanisms more obvious to use correctly or fail safe if used incorrectly improves outcomes when flawed human beings use them. It also makes them more pleasant to use in general.

Besides, look at the man page in question. It's talking about this in terms of encoding niceties and doesn't even spell out the possibility of deliberate, let alone malicious multiline values:

"curl makes sure that each header you add/replace is sent with the proper end-of-line marker, you should thus not add that as a part of the header content: do not add newlines or carriage returns, they only mess things up for you."

That's inducing a wrong/incomplete mental model of how this parameter works.

blueflow · 2025-05-20T15:53:08 1747756388

> doesn't even spell out the possibility of deliberate, ... multiline values

It does for me, as any kind of extra newlines results in a multi-line string.

> ... malicious ...

Like Daniel said, garbage in, garbage out. If you pass user inputs to curl, one should check what curl does with these values and take proper care.

robertlagrant · 2025-05-21T08:25:10 1747815910

> do not add newlines or carriage returns, they only mess things up for you

I disagree, but I would say that curl might as well add this as a validation check than a documentation warning.

blueflow · 2025-05-21T09:00:39 1747818039

This is explained in the ticket:

  One of the reasons we still allow that is that this "feature" was used quite deliberately by users in the past and I have hesitated to change that for the risk that it will break some users use cases.

robertlagrant · 2025-05-23T11:32:04 1747999924

Yes, I'm not sure if I agree with this or not. Those users don't have to upgrade. But obviously I'm not maintaining a key tool for the world. It's just my opinion.

robertlagrant · 2025-05-20T15:00:42 1747753242

> That is not a bug in curl, at most it's a bug in whatever gathered $UNTRUSTED_USER_INPUT.

But that could just contain the bad header only, could it not?

wang_li · 2025-05-13T18:12:30 1747159950

> but only if they know the software is higher quality.

I assume all software is shit in some fashion because every single software license includes a clause that has "no fitness for any particular purpose" clause. Meaning, if your word processor doesn't process words, you can't sue them.

When we get consumer protection laws that require that software does what is says on the tin quality will start mattering.

wang_li · 2025-05-11T00:45:58 1746924358

The change rate in binary notation is fractal.

wang_li · 2025-05-09T15:23:22 1746804202

When you speak in abstracts and generic terms about the value of government funding research, you are saying nothing meaningful in terms of knowing whether the government should spend more or less on research. If the OP's specific research was into The Changing Mating Habits of the Delta Smelt Due to Habitat Destruction, then probably it was money that could far better spent paying tuition for, say, medical students or even just letting tax payers keep their money and spend it in a way that directly benefits their family, their community, and themselves. Otherwise you are just handwaving and demanding everyone assume that all research is good and should be publicly funded.

In terms of cutting NSF budget, they have issued grants for things that explicitly violate Title IX of the Civil Rights act.[1] You can't justify all NSF spending by cherry picking successful past spending. We can evaluate the benefits of proposed research and whether it aligns with the intentions and values of society at large. We don't have to spend because someone incanted the words "Because SCIENCE!" over a bubbling beaker.

1. https://www.nsf.gov/awardsearch/showAward?AWD_ID=2424507&His...

cmontella · 2025-05-09T16:17:10 1746807430

> If the OP's specific research was into The Changing Mating Habits of the Delta Smelt Due to Habit Destruction, then probably it was money that could far better spent paying tuition for, say, medical students or even just letting tax payers keep their money and spend it in a way that directly benefits their family, their community, and themselves.

The problem is it's very hard to know ahead of time which research directions will yield fruit. If we knew how to only fund good research, then science funding would be very easy. Unfortunately, that's not the case -- oftentimes things that are sure bets fail, and things that are rejected as "not promising" result in a breakthrough. So we have to fund a lot of stuff, some of which is not obviously going to yield a great ROI.

On the one hand, yes, funding science the way we do results in a lot of "wasted" funding. There are tons of inefficiencies. On the other hand, the way we fund science has been wildly successful in terms of the benefits we have reaped. Look around you, you can see them everywhere in every sector.

The danger is we pull back funding to things that are "sure bets" and they turn out to be duds while we miss out on other less sure opportunities. That would be a loss for everyone involved.

anigbrowl · 2025-05-09T19:11:00 1746817860

Delta Smelt

I did not stop reading right there, but I may as well have. Invoking this particular area of research has become a popular conservative trope, because casual news readers do not get the point of studying a tiny fish in general or its love life in particular, even though it's a useful indicator species for the overall health of the riparian ecosystem.

You seem you like an intelligent person. Why are you leaning on tropes that exploit and glorify ignorance and anti-intellectualism?

wang_li · 2025-05-05T16:23:26 1746462206

> read TFA for the iptables config that fixes those apps and devices that bypass local DNS. For example,

Don't worry. All the browsers and stuff are bypassing this level of control by moving to DNS-over-HTTPS. You'll either have to deploy a TLS terminating proxy on your network, or give up on this arms race.

mikevin · 2025-05-05T17:22:39 1746465759

Would certificate pinning also remove the first option? I wonder if we are moving to a system where inspecting your own traffic isn't a viable option anymore, am I missing a workaround?

jcalvinowens · 2025-05-06T16:56:02 1746550562

If you control the machine you can always defeat pinning, given enough effort. But for an IoT device, yeah, we're already there.

gbuk2013 · 2025-05-05T16:29:22 1746462562

To be fair, if you are geeky enough to run a PiHole you will have no trouble finding the config option to turn off DoH in your browser.

int0x29 · 2025-05-05T18:02:33 1746468153

Don't turn it off in your browser. If you have control of that setting just install an ad blocker. The point of DNS block lists is to get rid of ads on phones, TVs, and other non configurable things.

nobody9999 · 2025-05-05T21:03:16 1746478996

>Don't turn it off in your browser. If you have control of that setting just install an ad blocker. The point of DNS block lists is to get rid of ads on phones, TVs, and other non configurable things.

Yes, and...It's not just to block ads. It's also to block various trackers and unwanted/surreptitious "telemetry" and "updates" to those devices you can't control/configure.

hnuser123456 · 2025-05-05T20:11:29 1746475889

Except, now you don't really control your web browser either, and ad blockers are getting crippled. It is an uphill battle.

Larrikin · 2025-05-06T00:39:28 1746491968

AdBlockers are not crippled on Firefox

woleium · 2025-05-05T18:52:00 1746471120

And then there is amazon sidewalk, which can only be evaded by unplugging the wifi board on your tv

freedomben · 2025-05-05T17:58:02 1746467882

True, but I want all the devices on my home network to have DoH disabled too. Most of them I can't change directly.

gosub100 · 2025-05-05T23:14:20 1746486860

The arms race will continue. I think the next gen will be a self hosted archive.ph style host that lets all the garbage load and distills it into a PDF or Web 1.0 style file ready for consumption. I would be fine with a browser extension that learns what I watch the most and preloads it for me, and/or an on demand service that shares prerendered sites bundled into torrents that group together common interests.

Edit: as much as I dislike AI, I concede it would be lovely to tell it to replace all ads with pictures of flowers.

DrillShopper · 2025-05-06T02:00:11 1746496811

That's what The Internet Junkbusters Proxy / Privoxy excelled so good at.

wkat4242 · 2025-05-05T22:21:32 1746483692

Yeah DoH was a solution to a really niche US-only problem where their laws provided the ability for providers to sell their users' DNS logs. In normal countries with privacy protections this isn't a thing anyway.

In this model, DoH is only a bad thing because it evades local DNS control.

I know that apps can always roll their own or even hardcode servers, but I hate the way that DoH was seen as some kind of saviour even though it adds zero benefit to European users and only adds negatives.

diogocp · 2025-05-06T01:31:04 1746495064

Your comment makes no sense. The DoH providers can still log requests and sell them.

DoH protects against intermediaries spying on your requests and potentially forging responses. Exactly the same as HTTPS.

Sending anything in clear text over the internet in 2025 is criminally negligent.

koito17 · 2025-05-06T01:57:26 1746496646

HTTPS is not necessary to encrypt DNS traffic. DNS-over-TLS exists, but it has much less traction compared to DNS-over-HTTPS. I am guessing the reason is that HTTPS traffic all goes through port 443, so "censorship" of DNS becomes tricky, since DNS traffic becomes a bit harder to distinguish from ordinary web traffic.

Encapsulating DNS packets in HTTP payloads still feels a bit strange to me. Reminds me a bit of DOCSIS, which encapsulates ethernet frames in MPEG-2 Transport Stream packets (this is not a joke).

baq · 2025-05-06T05:36:56 1746509816

Everything other than 80 and 443 is blocked by default, anything-over-https is just a matter of time. With a properly configured TLS MITM proxy only certificate pinning will prevent snooping, but it’ll also prevent connectivity, so you might call it a win for security/privacy, or a loss for the open internet if it’s you who needs to VPN to a safe network from within such an environment…

wkat4242 · 2025-05-06T10:20:24 1746526824

A port number does not force a certain protocol. You can run everything you want over port 443.

And yeah I also think it's a really bad idea to run everything over https. But I don't think it'll happen.

baq · 2025-05-06T12:00:08 1746532808

You can. The client side enterprise proxy/firewall really doesn’t want you to, though. Just a fact of life.

wkat4242 · 2025-05-06T13:17:33 1746537453

Yeah I wasn't really thinking of enterprise in this whole discussion though. After all, it's about pi-hole.

wkat4242 · 2025-05-06T03:00:52 1746500452

Yes but in the US the ISPs are the intermediary. And the big DoH providers like Cloudflare have better privacy protection.

Here the ISPs are intermediairs too, but we have laws to prevent them from using our data using DPI etc. And even if you use their DNS.

I agree encryption is important but DoT is much better then. DoH mainly took off because of this in the US.

notarealllama · 2025-05-05T16:29:21 1746462561

Jokes on you, I do have a fortinet which does this.... Oh wait, only up to TLS 1.1 or something and it's slow.

I forgot the name of the software but there used to be a few tools to terminate and reencrypt. But yeah dnssec is it's own challenge

gbuk2013 · 2025-05-05T16:36:56 1746463016

You need to get an F5 box instead. :)

wang_li · 2025-05-01T17:45:49 1746121549

You're in a comment section where people are flipping out that there exists a computer on his desk that isn't connected to any DoD network but is connected to the public internet.

Approximately 30,000 people go to work in the Pentagon every day. There are areas in the building that are SCIFs and they don't allow cell phones and laptops. But the majority of the building is an office building used for office building type stuff. Employees and contractors bring their personal cellphones and mobile devices in there every day.

int_19h · 2025-05-01T19:34:21 1746128061

Are they using those devices to discuss upcoming military operations?