Hacker Newsnew | past | comments | ask | show | jobs | submit | mapontosevenths's commentslogin

How will that prevent google from training AI's on his data?

> actual DIMM failures.

Yep, hardware failures, electrical glitches, EM interference... All things that actually happen to actual people every single day in truly enormous numbers.

It ain't cosmic rays, but the consequences are still flipped bits.


Scale makes the uncommon common. Remember kids, if she's one in a million that means there are 11 of her in Ohio alone.

> It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.

It reminds me of kids these days and their fancy calculators! Those new fangled doohickeys just aren't reliable, and the kids never realize that they won't always have a calculator on them! Everyone should just do it the good old fashioned way with slide rules!

Or these darn kids and their unreliable sources like Wikipedia! Everyone knows that you need a nice solid reliable source that's made out of dead trees and fact checked but up to 3 paid professionals!


I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably.

Sure, maybe someday LLMs will be able to report facts in a mostly reliable fashion (like a typical calculator), but we're definitely not even close to that yet, so until we are the skepticism is very much warranted. Especially when the details really do matter, as in scientific research.


> I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably

Reproducibility and repeatability in the sciences?

Replication crisis > Causes > Problems with the publication system in science > Mathematical errors; Causes > Questionable research practices > In AI research, Remedies > [..., open science, reproducible workflows, disclosure, ] https://en.wikipedia.org/wiki/Replication_crisis#Mathematica...

Already verifiable proofs are too impossibly many pages for human review

There are "verify each Premise" and "verify the logical form of the Argument" (P therefore Q) steps that still the model doesn't do for the user.

For your domain, how insufficient is the output given process as a prompt like:

Identify hallucinations from models prior to (date in the future)

Check each sentence of this: ```{...}```

Research ScholarlyArticles (and then their Datasets) which support and which reject your conclusions. Critically review findings and controls.

Suggest code to write to apply data science principles to proving correlative and causative relations given already-collected observations.

Design experiment(s) given the scientific method to statistically prove causative (and also correlative) relations

Identify a meta-analytic workflow (process, tools, schema, and maybe code) for proving what is suggested by this chat


> whether the researcher's calculator was working reliably.

LLM's do not work reliably, that's not their purpose.

If you use them that way it's akin to using a butter knife as a screwdriver. You might get away with it once or twice, but then you slip and stab yourself. Better to go find screwdriver if you need reliable.


Im really not motivated by this argument; it seems a false equivalence. Its not merely a spell checker or removing some tedium.

As a professional mathematician I used wikipedia all the time to lookup quick facts before verifying it myself or elsewhere. A calculator well; I can use an actual programming language.

Up until this point neither of those tools were asvertised or used by people to entirely replace human input.


There are some interesting possibilities for LLMs in math, especially in terms of generating machine-checked proofs using languages like Lean. But this is a supplement to the actual result, where the LLM would actually be adding a more rigorous version of a human's argument with all the boring steps included.

In a few cases, I see Terrance Tao has pointed out examples LLMs actually finding proofs of open problems unassisted. Not necessarily problems anyone cared deeply about. But there's still the fact that if the proof holds, then it's valid no matter who or what came up with it.

So it's complicated I guess?


I hate to sound like a 19 year old on Reddit but:

AI People: "AI is a completely unprecedented technology where its introduction is unlike the introduction of any other transformative technology in history! We must treat it totally differently!"

Also AI People: "You're worried about nothing, this is just like when people were worried about the internet."


The internet analogy is apt because it was in fact a massive bubble, but that bubble popping didn't mean the tech went away. Same will happen again, which is a point both extremes miss. One would have you believe there is no bubble and you should dump all your money into this industry, while the other would have us believe that once the bubble pops all this AI stuff will be debunked and discarded as useless scamware.

Well the internet has definitely changed things; but also it wasnt initially controlled by a bunch of megacorps with the same level of power and centralisation today.

:pointing-up-emoji:

> Those new fangled doohickeys just aren't reliable

Except they are (unlike a chatbot, a calculator is perfectly deterministic), and the unreliability of LLMs is one of their most, if not the most, widespread target of criticism.

Low effort doesn't even begin to describe your comment.


As low effort as you hand waving away any nuance because it doesn’t agree with you?

> Except they are (unlike a chatbot, a calculator is perfectly deterministic)

LLM's are supposed to be stochastic. That is not a bug, I can see why you find that disappointing but it's just the reality of the tool.

However, as I mentioned elsewhere calculators also have bugs and those bugs make their way into scientific research all the time. Floating point errors are particularly common, as are order of operations problems because physical devices get it wrong frequently and are hard to patch. Worse, they are not SUPPOSED TO BE stochastic so when they fail nobody notices until it's far too late. [0 - PDF]

Further, spreadsheets are no better, for example a scan of ~3,600 genomics papers found that about 1 in 5 had gene‑name errors (e.g., SEPT2 → “2‑Sep”) because that's how Excel likes to format things.[1] Again, this is much worse than a stochastic machine doing it's stochastic job... because it's not SUPPOSED to be random, it's just broken and on a truly massive scale.

[0] https://ttu-ir.tdl.org/server/api/core/bitstreams/7fce5b73-1...

[1]https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-al...


That’s a strange argument. There are plenty of stochastic processes that have perfectly acceptable guarantees. A good example is Karger’s min-cut algorithm. You might not know what you get on any given single run, but you know EXACTLY what you’re going to get when you crank up the number of trials.

Nobody can tell you what you are going to get when you run an LLM once. Nobody can tell you what you’re going to get when you run it N times. There are, in fact, no guarantees at all. Nobody even really knows why it can solve some problems and why it can’t solve other except maybe it memorized the answer at some point. But this is not how they are marketed.

They are marketed as wondrous inventions that can SOLVE EVERYTHING. This is obviously not true. You can verify it yourself, with a simple deterministic problem: generate an arithmetic expression of length N. As you increase N, the probability that an LLM can solve it drops to zero.

Ok, fine. This kind of problem is not a good fit for an LLM. But which is? And after you’ve found a problem that seems like a good fit, how do you know? Did you test it systematically? The big LLM vendors are fudging the numbers. They’re testing on the training set, they’re using ad hoc measurements and so on. But don’t take my word for it. There’s lots of great literature out there that probes the eccentricities of these models; for some reason this work rarely makes its way into the HN echo chamber.

Now I’m not saying these things are broken and useless. Far from it. I use them every day. But I don’t trust anything they produce, because there are no guarantees, and I have been burned many times. If you have not been burned, you’re either exceptionally lucky, you are asking it to solve homework assignments, or you are ignoring the pain.

Excel bugs are not the same thing. Most of those problems can be found trivially. You can find them because Excel is a language with clear rules (just not clear to those particular users). The problem with Excel is that people aren’t looking for bugs.


> But I don’t trust anything they produce, because there are no guarantees

> Did you test it systematically?

Yes! That is exactly the right way to use them. For example, when I'm vibe coding I don't ask it to write code. I ask it to write unit tests. THEN I verify that the test is actually testing for the right things with my own eyeballs. THEN I ask it to write code that passes the unit tests.

Same with even text formatting. Sometimes I ask it to write a pydantic script to validate text inputs of "x" format. Often writing the text to specify the format is itself a major undertaking. Then once the script is working I ask for the text, and tell it to use the script to validate it. After that I can know that I can expect deterministic results, though it often takes a few tries for it to pass the validator.

You CAN get deterministic results, you just have to adapt your expectations to match what the tool is capable of instead of expecting your hammer to magically be a great screwdriver.

I do agree that the SOLVE EVERYTHING crowd are severely misguided, but so are the SOLVE NOTHING crowd. It's a tool, just use it properly and all will be well.


One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.

I do think they can be used in research but not without careful checking. In my own work I’ve found them most useful as search aids and brainstorming sounding boards.


> I do think they can be used in research but not without careful checking.

Of course you are right. It is the same with all tools, calculators included, if you use them improperly you get poor results.

In this case they're stochastic, which isn't something people are used to happening with computers yet. You have to understand that and learn how to use them or you will get poor results.


> One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.

I made this a separate comment, because it's wildly off topic, but... they actually aren't. Especially for very large numbers or for high precision. When's the last time you did a firmware update on yours?

It's fairly trivial to find lists of calculator flaws and then identify them in research papers. I recall reading a research paper about it in the 00's.


One issue with this analogy is that paper encyclopedias really are precise when used correctly. Wikipedia is not.

I do think it can be used in research but not without careful checking. In my own work I've found it most useful as a search aid and for brainstorming.

^ this same comment 10 years ago


Paper encyclopedias were neither precise nor accurate. You could count on them to give you ballpark figures most of the time, but certainly not precise answers. And that's assuming the set was new, but in reality most encyclopedias ever encountered by people in reality were several years old at least. I remember the encyclopedia set I had access to in the 90s was written before the USSR fell..

> I do think it can be used in research but not without careful checking.

This is really just restating what I already said in this thread, but you're right. That's because wikipedia isn't a primary source and was never, ever meant to be. You are SUPPOSED to go read it then click through to the primary sources and cite those.

Lots of people use it incorrectly and get bad results because they still haven't realized this... all these years later.

Same thing with treating stochastic LLM's like sources of truth and knowledge. Those folks are just doing it wrong.


Annoying dismissal.

In an academic paper, you condense a lot of thinking and work, into a writeup.

Why would you blow off the writeup part, and impose AI slop upon the reviewers and the research community?


I don't necessarily disagree, but researchers are not required to be good communicators. An academic can lead their field and be a terrible lecturer. A specialist can let a generalist help explain concepts for them.

They should still review the final result though. There is no excuse for not doing that.


I disagree here. A good researcher has to be a good communicator. I am not saying that it is necessarily the case that you don't understand the topic if you cannot explain it well enough to someone new, but it is essential to communicate to have a good exchange of ideas with others, and consequently, become a better researcher. This is one of the skills you learn in a PhD program.

That is how it should be, yes. Do PhDs always meet that standard though? No.

It's being downvoted because it's a ridiculous premise. "The Elites" are human too. This attitude is nonsensical and child-like. Nobody is out here trying to round up the hippies and force them to live in some kind of pods to be harvested for their nutrients or whatever.

This technology, like every prior technology, will cause some people to lose their jobs and some new jobs to be created. This will annoy people who have to learn new skill instead of coasting until retirement as they planned.

It is no different than the buggy whip manufacturers being annoyed at Henry Ford. They were right that it was bad for their industry, but wrong about it being the death of... well all the million things they claimed it would be the death of.


And just like Henry Ford and the automobile, one of many externalities was the destruction of black communities: white flight that drained wealth, eminent domain for highways, and increased asthma incidence and other disease from concentrated pollution.

Yet, overall it was a net positive for society... as almost every technological innovation in history has been.

Did you know the 2/3rds of the people alive today wouldn't be if it hadn't been for the invention of the Haber-bosch process? Technology isn't just a toy, it's our life support mechanism. The only way our population gets to keep growing is if our technology continues to improve.

Will there be some unintended consequences? Absolutely. Does that mean we can (or even should) stop it? Hell no. Being pro-human requires you to be pro-technology.


I don't think this argument is logically sound. The assertion that this (and every other!!) technological innovation is a "net positive" merely because of our monotonic population growth is both weakly defined and unsubstantiated. Population is not a good proxy for all things we find desirable in society, and even if it were, it is only a single number that cannot possibly distinguish between factors that helped it and factors that hurt it.

Suppose I invent The Matrix, capable of efficiently sustaining 100b humans provided they are all strapped in with tubes and stuff. Oh and no fancy simulation to keep you entertained either -it's only barely an improvement on death. Economics forces everyone into matrix-hell, but at least there's a lot of us. Net positive for society?


Human fecundity is probably not actually the meaning of life, it's just the best approximation most people can wrap their heads around.

If you can think of a better one, let me know. Be warned though, you'll be arguing with every biological imperative, religion, and upbringing in the room when you say it.


"as almost every technological innovation in history has been"

This is simply false. You really are the king of making unfounded claims.


I don't need to prove anything. You folks are the ones claiming harm. That said, AI is more akin to the invention of antibiotics than it is to the invention of any specific drug. Name any other entire category of technology from which no good has ever come. Just one.

I doubt you can. Even bioweapons led to breakthroughs in pesticides and chemotherapy. Nukes led to nuclear power, and even harmful AI stuff like deep fakes are being used for image restorations, special effects, and medical imaging.

You're just flat out wrong, and I think you know it.


You are speaking in tautology. Yes we know that technology investment often leads to great advancement and benefits for humanity, but it is not sufficient to obviate the need for consciousness and reduction of harm. This technology will be used to disenfranchise people and we need to be willing to say, "no, try again." Not to stop advancement, but to steer it into being more equitable.

We should be trying to optimize for the best combination of risk and benefit, not taking on unlimited risk in the promise of some non-zero benefit. Your approach is very much take-it-or-leave-it which leaves very little room for regulating the technology.

The GenAI industry lobbying for a moratorium on regulation is them trying to hand wave any disenfranchisement (e.g. displaced workers, youth mental health, intellectual property rights violated, systemically racist outcomes, etc).


> We should be trying to optimize for the best combination of risk and benefit

I 100% support this stance, it's good advice for life in general. I object to the ridiculous Luddite's view espoused elsewhere in this thread.

>The GenAI industry lobbying for a moratorium on regulation is them trying to hand wave any disenfranchisement (e.g. displaced workers, youth mental health, intellectual property rights violated, systemically racist outcomes, etc).

There must be a balance certainly. We can't "kill it before it's born", but we also need to be practical about the costs. I'm all in on debating exactly where that line should be, but object to the idea that it provides no value at all. That's madness, and dishonesty.


Henry Ford didn't make his cars out of buggy whips. He made a new industry. He didn't cannibalize an existing one. You cannot make an LLM without digesting the source material.

> He made a new industry. He didn't cannibalize an existing one.

I don't see how you can claim the second part is true. Cars directly cannibalized other forms of self transportation.


? Cars don't "eat" horses. I wouldn't equate "making redundant" with "consuming"

LLMs don't literally eat artists. I think you understood the metaphor.

Cannibalizing a <product/industry/etc.> is a common phrase to describe the act of a new thing outcompeting an existing thing to another thing to the degree that it significantly harms the market share, sometimes to the point of figurative extinction. Redundancy is a very common reason for this to occur.

It has nothing to do with literally eating.


Digesting is a weird way to say "learning from." By that logic I've been digesting news, books, movies, songs, and comic books since I was born. My brain is great big 'ole copyright violation.

What matters here is not the source material, it's the output. Possessing or consuming copyrighted material is not illegal, distributing it is. So what matters here is: Can we say that the output is transformative, and does it work to progress the arts and sciences (the stated purpose of copyright in the US constitution)?

I would say yes to both things, except in rare cases of bugs or intentional copyright violations. None of the major AI vendors WANT these things to infringe copyright, they just do it from time to time by accident or through the omission of some guardrail that nobody had yet considered. Those issues are generally fixed fairly promptly (a few major screw ups notwithstanding).


So we have monkeys writing the same code over and over again, until the end of time. Because of "rules".

And for those of us living a reality of subjugation and fear, you're a fucking liar.

Reddit is growing because they introduced automatic machine translation and Indians have been joining at an increasing rate. That content is mixed into the English language content, but is of very low quality and irrelevant to many native English speakers. Similarly they mix the English content in with the Indian content.

Essentially, Reddit is also eating it's own tail to survive as the flood of low quality irrelevant content is making the platform worse for speakers of all languages but nobody cares because "line go up."


It was presented without explanation and can be ignored without explanation.

You need an explanation of how people make norms & laws regarding what is acceptable or unacceptable in society and industry?

No such claim was made, therefore no such claim needs to be refuted. If people want to engage in conversation they will have to use their words to do it.

The article gets the part about organic data dying off right. Look at Google SERP's for an example. Almost nobody clicks through to the source anymore, so ad revenue is drying up for them and people are publishing less or publishing in places that pay them directly and live behind a paywall like Medium. Which means Google has less data to work with.

That said, what it misses is that the AI prompts themselves become a giant source of data. None of these companies are promising not to use your data, and even if you don't opt-in the person you sent the document/email/whatever to will because they want it paraphrased or need help understanding it.


>AI prompts themselves become a giant source of data.

Good point, but can it match the old organic data? I'm skeptical. For one, the LLM environment lacks any truth or consensus mechanism that the old SO-like sites had. 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth. Also, the old sites were two-sided, with humans asking _and_ answering questions, while they are only on the asking side with AI.


> (AI) doesn't have the concept of correctness/truth

They kind of do, and it's getting better every day. We already have huge swatches of verifiable facts available to them to ground their statements in truth. They started building Cyc in 1984, and Wikipedia just signed deals with all the major players.

The problem you're describing isn't intractable, so it's fairly certain that someone will solve it soon. Most of the brightest minds in society are working on AI in some form now. It's starting to sound trite, but today's AI's really are the worst that AI will ever be.


“ Most of the brightest minds in society are working on AI in some form now.”

Source? I haven’t met one intelligent person working on AI. The smartest people are being ground into dust. They’re being replaced by pompous overconfident people such as yourself.


> I haven’t met one intelligent person working on AI.

I get the impression that you don't meet a lot of people in general.


> 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth

The LLM doesn't but reinforcement does. If someone keeps asking the model how to fix the problem after being given an answer, the answer is likely wrong. If someone deletes the chat after getting the answer, it was probably right.


AI is an entropy machine.

Those AI prompts that become data for the AI companies is yet another thing that the human creators used to understand what people wanted, topics to explore, feedback on what they hadn't communicated well enough. That 'value' is AI stealing yet more energy from the system resulting in even less/less valuable human creation.


> Is it your impression that scientists should be considered the paramount experts on climate change policy questions

Yes.

We should listen to people who use evidence and reason to suggest the best course of action. We should listen to people who have spent decades of their lives studying this issue for relatively little reward other than trying to make the world better.

We should NOT listen to semi-literate goobers who gained authority by being popular with simpletons they manipulated into voting for them, mostly through graft and trickery. Those people's opinions should be regarded as being equivalent in value to the opinion of your weird conspiracist uncle who helped vote them into power.


So your belief is that scientists are the people of evidence, reason, and selfless dedication to goodness, while policy people who are not scientists are incompetent and despicable?

I don't know. Is such a black and white group based worldview plausible? It's possible I guess, but I find it hard to believe?


Decisions made based on science are more effective than those based on politics in almost 100% of cases. Especially when the subject of those decisions itself is science (the climate).

I wouldn't argue that all scientists are selfless that would be silly. I would argue that the average scientist is less selfish than the average politician, yes.

Examine the motivations. Few people go into pure science seeking power or money. Most or all politicians do.


That brings us back to the original question: does the science tell us what to do? Or is it your contention that the scientists tell us what to do, and whatever scientists say about a decision is presumably the right way to make decisions based on science?

It is reasonable to ask "how should the science guide our actions?" I'm open to suggestions on the subject.

It is not reasonable to ask IF the science should guide our actions. They only alternative is madness.


On that question, here is an article I found helpful:

https://worksinprogress.co/issue/sunscreen-for-the-planet/


If there is scientific consensus that this is worth trying, and that the risk/reward ratio works out then I'm in favor of it.

Right now though, my own limited guess would be that the risk/reward doesn't justify it. The climate is a chaotic system which exemplifies the concept of sensitive dependence upon initial conditions. We could easily kill millions or even billions of people with a little "whoopsie". It might be better to wait until the alternative is worse than that potential cost.

I would, of course, defer to a consensus of experts on the subject if such a thing exists. I am not one.


Ducted UV systems for your HVAC exist now, and don't need to bother with the being UVC since the UV doesn't leave the system.

If I recall correctly my furnace guy quoted me less than $2k for a whole house system that attaches to the air intake on my furnace.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: