Hacker News new | past | comments | ask | show | jobs | submit | siboehm's comments login

You can achieve close to the same thing using uBlock element zapper, without installing extra plugins. I use the same trick for SO ('Hot Network Questions' darkpattern) and other sites.


Well, yes, but only assuming that YouTube does not change anything. If they do, then your uBlock rules won’t work, but RYS will presumably be updated to still work.


Reading your blogpost it wasn't clear to me how you ran your code through GPT-4. Which prompts did you use?


I brought up the web UI, and said something like this: "Here is a Go source file, see any problems, issues, or suggested bug fixes? <paste the Go code>"

Simple as that. Of course, sometimes I got the "message too big" error. So I pasted bits of the sources files, choosing pieces I thought were reasonably self-contained. I also fed much of my unit tests through it, asking "see any missing test cases?" While some of the answers were not that helpful, digesting the feedback from GPT-4 made me think more about my code, and make some changes for the better.


Nice, that's simple enough! You may want to try talking to it via the API, last time I checked the web UI doesn't accept prompts >4K tokens while GPT-4 via the API has a 8K Token limit. And then there's the 32K version...


I can't believe they are only charging me $20/month for access to GPT-4. I'd pay more for it.


I’d pay more if it was faster, had API access, and stronger privacy.

Right now Sam Altman is channeling Zuckaberg by claiming to be the good guy changing the world, but in reality hoarding data and asking congress to build him a moat.


Exactly. Not only that, but according to a recently filed lawsuit they’re providing preferential access to YC batch companies first and then everyone else.


1. It can do some reformatting tasks faster than I can do them by hand. Example: Inline FuncA into FuncB <paste code for both functions>.

2. For more complicated tasks it requires good prompting. Example: Tell me three ways to fix this error, then pick the best way and implement it. <paste error> <paste relevant code>. Without the "step-by-step" approach it almost never works.

3. It's pretty good at writing microbenchmarks for C++. They always compile, but require some editing. I use the same prompting approach as (2.) for generating microbenchmarks.

4. It's pretty useful for explaining things to me that I then validate later via Google. Example (I had previously tried and failed to Google the answer): The default IEEE rounding mode is called "round to nearest, ties to even". However all large floating point numbers are even. So how is it decided whether 3,000,003 (which is not representable in fp32) becomes 3,000,002 or 3,000,004?.

5. It can explain assembly code. I dump plain objdump -S output into it.

The main limitation seems to be UI. chat.openai.com is horrible for editing large prompts. I wrote some scripts myself to support file-based history, command substitution etc.


But 3000003 is exact in fp32, as well as it is 3000002.5, 3000002.75 before it or 3000003.25, 3000003.5, 3000003.75 after it?

https://www.h-schmidt.net/FloatConverter/IEEE754.html

And AFAIK the LLMs, lacking the ability to actually calculate, can't check which numbers are such?


Ha, you're right. I should've used 30,000,000. The answer it gave regarding how to do "ties to even" was still correct though: "Evenness" is decided by the least significant bit of the mantissa, not be the "evenness" of the decimal representation.


Author here: Seems like a good trick! Though won't this affect shared memory alignment and make me loose those LDS.128 instructions? Or do these not require alignment? There's so little good docs on SASS.

In general I'm still confused about whether vectorized load instructions (LDS.128) necessarily lead to bank conflicts or not. My impression was that consecutive 32b floats get mapped to different banks, so to avoid conflicts I'd want the warp to load 32*32b consecutive elements at each step.


Hmm, I think you might have to adjust the padding to be 128 bits then:

    __shared__ float As[(CHUNKSIZE+4) \* CHUNKSIZE]
Ultimately it's down to trial and error, like always with GPGPU.


I built this decision tree (LightGBM) compiler last summer: https://github.com/siboehm/lleaves

It get's you ~10x speedups for batch predictions, more if your model is big. It's not complicated, it ended up being <1K lines of Python code. I heard a couple of stories like yours, where people had multi-node spark clusters running LightGBM, and it always amused me because by if you compiled the trees instead you could get rid of the whole cluster.


Wow, very interesting, thanks for this. Daily batch predictions is all we do. I’m the maintainer of miceforest[1], do you think this would integrate well into the package at a brief glance? I’m always looking for ways to make this package faster.

[1] https://github.com/AnotherSamWilson/miceforest


I had a brief look at your package, and my impression was that it's only changing model training. If this is correct then the format of the model.txt (calling `lgbm.save(model, "model.txt")`) is the same as regular lightgbm. This would mean you can use my library for inference.


You can be very worried about the medium-term dangers of AGI even if you believed (which I don't) that consciousness could never arise in a computer system. I think it can be a useful metaphor to compare AGI to nuclear weapons. Currently we're trying to figure out how to make the nuclear bomb not go off spontaneously, and how to steer the rocket. (One big problem w/ the metaphor is that AGI will be very beneficial once we do figure out how to control it, which is harder to argue with nuclear weapons).

Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it.


> "Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it."

You're talking about "doomsday scenarios". Can you actually provide a few concrete examples?


Over the course of years, we figure out how to create AI systems that are more and more useful, to the point where they can be run autonomously and with very little supervision produce economic output that eclipses that of the most capable humans in the world. With generality, this obviously includes the ability to maintain and engineer similar systems, so human supervision of the systems themselves can become redundant.

This technology is obviously so economically powerful that incentives ensure it's very widely deployed, and very vigorously engineered for further capabilities.

The problem is that we don't yet understand how to control a system like this to ensure that it always does things humans want, and that it never does something humans absolutely don't want. This is the crux of the issue.

Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago, so an existence proof of such potential for accident already exists. Some mathematical function is used to decide what the AI will do, but the AI ends up maximizing this function in a way that its creators hadn't intended. There is a multitude of problems regarding this that we haven't made much progress on yet, and the level of capabilities and control of these systems appear to be unrelated.

A catastrophic accident with such a system could e.g. be that it optimizes for an instrumental goal, such as survival or access to raw materials or energy, and turns out to have an ultimate interpretation of its goal that does not take human wishes into account.

That's a nice way of saying that we have created a self-sustaining and self-propagating life-form more powerful than we are, which is now competing with us. It may perfectly well understand what humans want, but it turns out to want something different -- initially guided by some human objective, but ultimately different enough that it's a moot point. Maybe creating really good immersive games, figuring out the laws of physics or whatever. The details don't matter.

The result would at best be that we now have the agency of a tribe of gorillas living next to a human plantation development, and at worst that we have the agency analogous to that of a toxic mold infection in a million-dollar home. Regardless, such a catastrophe would permanently put an end to what humans wish to do in the world.


> they can be run autonomously and with very little supervision produce economic output that eclipses that of the most capable humans in the world.

What’s the evidence for this?

> Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago

What are you referring to?


By the time you find such evidence, it could already be close to game over for humanity. It’s important to get this right before that.

We already have significant warnings. See for yourself if latest models like Imagen, Gato, Chinchilla have economic values and can potentially cause harm.


Historical examples of perverse instantiation are everywhere: Evolutionary agents learning to live off a diet of their own children, machine learning algorithms attempting to learn gripping a ball cheating the system by performing ball-less movements that the camera erroneously classifies as successful, an evolutionary algorithm to optimize the number of circuit elements in a timer creating a timer circuit by picking up an external radio signal unrelated to the task and so on. Some examples are summarized here: https://www.wired.com/story/when-bots-teach-themselves-to-ch...

GP wanted a concrete example of a doomsday scenario of failed AI alignment, so in that context extrapolating to a plausible future of advanced AI agents should suffice. If you need a double-blind peer reviewed study to consider the possibility that intelligent agents more capable than humans could exist in physical reality, I don't think you're in the target audience for the discussion. A little bit of philosophical affinity beyond the status quo is table stakes.


This has always confused me as well. What would be the reason why some adversary would choose to craft an adversarial example and deploy it in the real world versus the much easier solution to just remove / obscure the sign?


Depending on how big or small it needs to be, potentially for subtlety? Especially on current roads that are shared by humans and self-driving systems, a human observer will immediately notice that something is terribly wrong with a replaced sign.

But... around here at least, signs have stickers or graffiti on them often enough. Like adding the name of a politician under a stop sign: "Stop [Harper]". An appropriately made adversarial example won't stick out visually the same way that a wholesale sign swap will.


Because NeurIPS doesn't publish papers on stop sign removal yet :P


Warfare comes to mind, as weapons gain increasingly powerful ai functions and become autonomous


Here's a good review of the research around this problem: https://nintil.com/bloom-sigma/

The author concludes (IMO, but you should read it yourself) that while tutoring does have positive effects, the 2 Sigma effect size measured by Bloom was probably an outlier.


Keep in mind that Benjamin Bloom attributes much of the success of tutoring to the affective learning components. That is not addressed in the meta analysis.

Bloom described how the tutor and the student achieved an emotional connection that is often difficult to achieve with a class of students.

This is so critical, because the barrier to learning for many students is emotional. It’s not that they are really trying and just don’t get it; it’s that they don’t have the capacity to care enough to engage over time. Emotional barriers to learning are super widespread — being “bored” for instance is an emotional response. Transformational learning takes place when there is an authentic emotional motivation to succeed. Human contact can support that. It’s also why stuff like ALECKS only goes so far. There is not emotional resonance, like with a human.


I think you could combine both models: Students use tutoring software and their teacher spends time with each student celebrating their progress and encouraging and teaching them when they struggle. The teacher's primary job would be to teach the children how to learn and succeed in academics. Specifically, teachers would teach students how to make plans, follow plans, focus, how to think about success, how to think about failure, determine the cause of failure, update their plans, develop determination, evaluate their own mood, and recognize their mental habits. Teachers would also assign, grade, and give feedback on student projects. Projects would have multiple iterations before a final grade.


> being “bored” for instance is an emotional response

Don't be so patronising. People are bored (not "bored") because they're being asked to do something they don't care about. That's an extremely common experience. About a third of high school students are bored every day in every class, and another third report being bored in at least one class every day. Most people have no interest in intellectual pursuits. Their preferences are completely valid.


Patronizing?

I’m sure we can sit here all day and debate what material kids should learn. My point is that 1. people do well in any subject when they care about it and 2. tutors often help kids care. I’m trying to distinguish this emotional effect of tutoring from the cognitive effect —- otherwise it is difficult to explain the 2 sigma findings.


A relevant quote from the work you cite:

> The history of the educational research literature is one plagued with low quality small sample size studies that were done decades ago, with less work being done now. It can be that now researchers are focusing on studying other instructional methods. Still, the fact that most large RCTs tend to find little effects should make us have a sceptical prior when presented with a new educational method.


Thanks. My takeaway from the Nintil article is that nobody has performed a good study on the effectiveness of mastery learning vs traditional teaching. All of the studies have some fatal flaw: not randomized, small sample size, study duration too short, interval between exams too long, not providing specialized remedial content to students, or not actually requiring mastery.

I think the massive effects shown by software tutoring in the DARPA studies point to the mechanism: frequent exams and specialized remedial content. Good tutoring software continually tests students for mastery, identifies specific misconceptions, and provides specialized remedial content for each misconception. The automated software can perform this iteration for each core concept, multiple times per hour. Students frequently get feedback on problems so they waste little time trying to learn material when they don't have the pre-requisite concepts. Students also frequently pass section mini-exams and enjoy feelings of accomplishment. These positive feelings help with learning.

Compare that to the mastery learning studies performed. The studies gave exams once a week or once every 4 weeks. A student with a crucial misconception will struggle for weeks before the they finally understand the content. During that time, they feel frustrated and unmotivated.

We need a good study of mastery learning.

We also need researchers to design their studies better.

IDEA: A new kind of journal with an open study design process. Researchers submit their study proposal, experimental procedures, example raw data, code for cleaning and filtering the raw data, code for statistical analyses, code for generating tables and graphs from data, and a paper template that includes different conclusions based on the values produced by the code. The paper template pulls in the tables and graphs generated by the checked-in code. All of this content is public. Anyone may register an account and provide feedback. Vetted researchers volunteer to review the proposal and code. They receive credit in the resulting paper. When reviewers give LGTM, then the journal and researchers commit to publishing the paper, regardless of the results, and before they have done any experiments. A separate LGTM is required from an experienced statistician. The code includes assertions for sample sizes and valid data ranges.

The researchers must record video of themselves as they perform the experiments. They must also record raw data from their instruments. They must upload these recordings and raw data. The reviewers must LGTM the recordings and any PII redactions. The researchers must get LGTM for all changes to the code and paper template. The journal's servers execute the template and generate the final paper. When someone later discovers an error in the analysis or code, they can file a ticket or send a pull-request with a proposed change. The researchers commit to reviewing every issue and PR within a time limit. If they fail to do that, then the reviewers must handle it. If the reviewers also fail to do it, then the journal assigns another qualified volunteer as a new reviewer to handle it. After making a change, the system generates a new version of the paper.

Anyone may "star" the paper and receive notifications whenever it changes or there is a change to any of the papers it references. If a paper is withdrawn, the system automatically adds warnings to all papers that reference it.


Thank you for the link. There is a lot to take in there. (Perhaps the author could have provided a computer-based tutor ;-) )


I've been running a setup with Recoll and https://github.com/ArchiveBox/ArchiveBox for a few months now [1]. Each morning archivebox scrapes all new links that I've put into my (text-based) notes and saves them as HTML singlefiles. Then Recoll indexes them.

It's very fast and ~4 lines of code. It's surprising how often I rediscover old blog posts & papers that are much better than what Google yields me.

From my experience Recoll isn't very good at searching for aliases sadly.

[1] https://siboehm.com/articles/21/a-local-search-engine


Tangential but I really love the look of your site, especially the footnotes displayed adjacent to the text. Did you create the layout yourself? I have been looking for a theme like this for a paper discussion website.


Not my site, but it looks like a theme based on Tufte CSS: https://edwardtufte.github.io/tufte-css/


Sounds cool, thanks for the inspiration!


In stories like this it's important to consider that returns on R&D for these large pharma companies have recently been getting quite low, 1.8% in 2019 according to [0].

[0] https://www.reuters.com/article/us-pharmaceuticals-r-d/pharm...


Why should I, a person who is not a large pharma company, care about this?


Because this means for you as a person suffering from anything that isn’t extremely common, life threatening or severely disabling, pharma companies have no financially viable path to develop new treatments for you.

When kids say “I wanna grow up to develop a cure for cancer” the default answer shouldn’t be “That’s stupid? Even if you found a cure it would be to expensive to get to market. Why not do something that will actually benefit society like getting a law degree?”

I know it’s hip to hate on pharma companies, but at some point you have to realize that all those kids grew up to become scientists that are now working in these pharma companies trying to find drugs that will help people 20 years from now. But the fact is that the scope of research is limited by the financial reality that these companies face. If the net ROI on R&D goes negative that just means that companies stop investing in R&D and the first thing to happen is that scientists working on potential high impact low chance of success get fired. So out goes cancer treatment, while the team of people trying to figure out which color packaging for aspiring tablets maximizes chance of customer retention gets doubled.


Put another way, I care as much about pharma company profits as pharma companies care about my personal health.


Those numbers are somewhat financial fiction to garner sympathy for big pharma, lobby for reduced regulations, and for “AI” startups to pitch their products.


Can you substantiate this?


Substantiate? No, you'll have to wait for the next Panama Papers for that.

But the process is similar to https://en.wikipedia.org/wiki/Hollywood_accounting : there's so many levels of suppliers/customers/expenses that may or may not belong to the same conglomerate, it's very easy to make the numbers of any individual organization reflect what you want it to reflect.


If you torture the data, it will speak.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: