How good is Codex?

terramauthe · on Aug 19, 2021

My guess is the end result of all this "AI" assisted code-generation is that it will have the same impact on the software engineering industry as spreadsheets had on accounting. I also believe that this AI-powered stuff is a bit of a "two-steps forward, one step back" situation and the real innovation will begin when ideas from tools like Louise [1] are integrated into the approach taken in Codex.

When VisiCalc was released departments of 30 accountants were reduced to 5 accountants because of the improvement for individual worker efficiency, however accounting itself remains largely unchanged and accountants are still a respected profession who perform important functions. There's plenty of programming problems in the world that simply aren't being solved because we haven't figured out how to reduce the burden of producing the software; code generation will simply increase the output of an individual software developer.

The same forces behind "no-code" are at work here. In fact I see a future where these two solutions intermingle: where "no-code" becomes synonymous with prompt-driven development. As we all know, however, these solutions will only take you so far -- and essentially only allow you to express problems in domains that are already well-solved. We're just expressing a higher level of program abstraction; programs that generate programs. This is a good thing and it is not a threat to the existence of our industry. Even in Star Trek they still have engineers who fix their computers...

[1] - https://github.com/stassa/louise

mxwsn · on Aug 19, 2021

Good take but I'm not convinced. I would suspect (see epistemic status) that while correctness is easy to reason about and maintain by a single accountant using spreadsheets, since there is a clean mapping from a precise excel api to function, the mapping from natural language to code is problematic. I won a t-shirt in the recent OpenAI codex challenge and found it hard to reason systemically about the behavior and generalization properties of generated code. When the generated code is wrong, it was frustrating on a different level than if I wrote the code myself.

Epistemic status: I don't know anything about accounting

Tarq0n · on Aug 19, 2021

It's not about what Codex can do now, but what the technology will be able to do a few generations into the future. Anticipating this shift in the software labor market is important to many people on HN.

Codex as it stands is just a novelty, but it does show the shape of what's to come.

omegalulw · on Aug 20, 2021

I disagree, people seem to really overblow how much boilerplate that you write (though I guess it would vary from language to language). The most important job in SWE is to think logically IMO. Codex isn't up for that.

eggsmediumrare · on Aug 19, 2021

It's not a good thing for those 25 other engineers.

criticaltinker · on Aug 19, 2021

I find it funny and concerning that you must prompt the model to do an SQL insert 'safely' to avoid injection vulnerabilities. I'm sure someone in the near future will find a way to train models to avoid well known hazards such as the OWASP Top 10 [1].

The field of program synthesis based on NLP models is really starting to heat up - we have OpenAI Codex, GitHub Copilot, and a recent paper [2] from Google Research demonstrating that these same techniques can generate programs which solve mathematical word problems. Here is an example of the latter:

> Prompt: Please, solve the mathematical problem: a and b start walking towards each other at 4pm at a speed of 2 kmph and 3 kmph. They were initially 15 km apart. At what time do they meet? n0 = 4.0, n1 = 2.0, n3 = 15.0.

> Model output (python program):

n0 = 4.0

n1 = 2.0

n2 = 3.0

n3 = 15.0

t0 = n1 + n2

t1 = n3 / t0

answer = n0 + t1

[1] https://owasp.org/www-project-top-ten/

[2] https://news.ycombinator.com/item?id=28217026

FractalHQ · on Aug 19, 2021

Interestingly, GitHub CoPilot is built on Codex.

legerdemain · on Aug 19, 2021

So what it sounds like is that using Codex to write code is like replacing your computer with an intern: can write plausible-looking code and commit messages, but needs constant close attention from an experienced engineer to stop it from turning into a bug machine.

marstall · on Aug 19, 2021

When i think about the work i do as a fairly blue collar front-end engineer writing react and swift code, it's interesting to consider how this could fit in and help.

The problem areas for the examples given here are somewhat self-contained, which is in contrast to the code I write, usually all about integrating multiple systems and bodies of knowledge (user device, network, data schema, industry practices, product requirements etc).

I too rarely, to my occasional regret, have a chance to write a more pure function whose function can be explained so concisely as these examples. Helper functions ("count the words" etc) are sprinkled throughout my code for sure but are mostly provided to me by the platforms I inhabit.

Codex's ability to explain a piece of code in plain english seemed exciting at first, but the type of "other people's code" I am usually puzzling over has so many tentacles into the specific "business rules" of the service i'm writing to. How would Codex know about all that?

Of course codex has already blown my mind several times so I am quite open to it someday being able to ingest an entire set of interrelated codebases and break them down for me succinctly. That doesn't even seem far-fetched, based on what we've seen to this point.

The thing that is ringing a bell for me the most is the idea of it being able to understand APIs and generate correct code for them. That could be a neat learning tool and save some boilerplate. Kind of like scaffold-generation code, but on steroids ...

... perhaps it could even learn to simplify APIs, make them friendlier etc. Or translate between them?

Kiro · on Aug 19, 2021

Is Codex different from the regular code examples on OpenAI, e.g. https://beta.openai.com/examples/default-translate-code or https://beta.openai.com/examples/default-fix-python-bugs

I can do that now with my OpenAI account but Codex needs a specific invite. What's the difference?

smitop · on Aug 19, 2021

Codex is different from GPT-3 in that it is a different model trained specifically on source code, and works better for that. I have access to Codex, when I click on those playground links I see that the engine selected on the right is "davinci-codex" and it says "The Codex models are currently in private beta. Usage is free during this period" at the bottom.

Nullabillity · on Aug 19, 2021

Looks like both of those are using Codex, as far as I can tell?

ranguna · on Aug 20, 2021

Instead of an AI that writes code, I would much prefer an AI that analyses my code and recommends other better approaches and does bug detection, like having an AI code review.

I can write my own code for now, thanks.

Kiro · on Aug 19, 2021

I jumped straight to the Conclusion expecting to find... a conclusion.

smitop · on Aug 19, 2021

I actually ended up putting what you would expect in a conclusion in the introduction. The conclusion is actually pretty worthless in hindsight.

peteretep · on Aug 19, 2021

> Often Codex will write code that looks right at first glance, but actually has subtle errors.

iurysza · on Aug 19, 2021

So... it's like a real human!

agravier · on Aug 19, 2021

We're getting out what we put in.

pistoriusp · on Aug 19, 2021

Can Codex write itself?

legutierr · on Aug 19, 2021

Wasn't the whole idea behind OpenAI that it would actually be "open"? Or is the name of the organization now entirely a misnomer?

Not only are they not releasing Codex (and GPT-3), but in order to get access to the API you have to apply for access and be judged against a proprietary set of criteria that are entirely opaque.

Furthermore, I imagine that if you do any innovative work building on top of Codex (or GPT-3) they would control that work product, they would be able to cut you off from accessing your work product at any time if it suits them, and they would be able to build off of your work themselves, co-opting any unique value that you may create.

Why the hell should anyone building an AI business even want to work with them? Sure, it might accelerate your effort right at the beginning, but if you are unable to reproduce your results outside of their platform, you will always be beholden to them.

In a few years will we be reading stories about unfortunate entrepreneurs who had built their businesses on top of OpenAI only to have the rug pulled out from under them, like Amazon sellers whose product was cloned by Amazon Basics, or Twitter clients cut off from the API, or iOS apps made redundant by their core functionality being copied by Apple, or search-driven businesses circumvented by the information cards that Google displays directly in the search results...? Etc, etc.

Am I missing something here?

webmaven · on Aug 19, 2021

OpenAI transitioned from non-profit to for-profit in 2019, took about $1 billion from Microsoft (there has been speculation that this was mostly in the form of Azure credits), and announced that Microsoft would be their preferred partner for commercializing OpenAI technologies: https://openai.com/blog/microsoft/

The name is now a complete misnomer.

There may still be some benefit for researchers to collaborate with them (same as with any of the other corporate research labs), but anyone trying to build a business on non-public APIs should obviously tread carefully.

So, no. You aren't missing anything.

burkaman · on Aug 19, 2021

They started as a non-profit and sort of claimed they would actually be open:

"As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies." - https://openai.com/blog/introducing-openai/

However, a couple paragraphs down might have been a clue to the likely future: "Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI."

Currently they are not really non-profit and mostly working with Microsoft.

intuitionist · on Aug 19, 2021

Many of these people are very good at tax structuring — and IANAL — but it’s pretty hard to believe that what they’re doing with OpenAI is kosher.

Nullabillity · on Aug 19, 2021

They aren't a non-profit at all anymore: https://openai.com/blog/openai-lp/

burkaman · on Aug 19, 2021

They're governed by a separate OpenAI Nonprofit which is why I said "sort of", but yes it's more profit than non.

3pt14159 · on Aug 19, 2021

They needed money in order to compete with Google and Facebook, so they switched the model to "investors make a maximum of 10x return and after that they lose their stake" or something like that.

As for being open, I think keeping potentially dangerous tech private for a while while openly sharing the results of research is prudent. The last thing I want is some AI model goes public then we find a way to generate a bunch of computer viruses or propaganda.

burkaman · on Aug 19, 2021

Agreed that not releasing models might be a good thing, I'm just pointing out that's not how they initially pitched the organization. I wonder if they might have gotten less favorable publicity when they launched if they just said "we're starting an AI company to compete with Google".

As for the business model, there's nothing wrong with it in principle, it's just not what they said they would do. There's no reason a well-funded nonprofit research organization needs to compete with Google and Facebook. They changed their funding model because they wanted to compete, not because they needed to. And it hasn't been very long since they said "Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact." You have to assume they knew from the start that they'd probably want to pivot to a business.

Isinlor · on Aug 19, 2021

I think it's maximum of 100x return. So, it's for profit with practically unlimited return.

> Returns for our first round of investors are capped at 100x their investment (commensurate with the risks in front of us), and we expect this multiple to be lower for future rounds as we make further progress.

https://openai.com/blog/openai-lp/

poszlem · on Aug 19, 2021

OpenAI is approximately as open as the German Democratic Republic or Democratic People's Republic of Korea were/are democratic.

cryptonector · on Aug 19, 2021

Lenin believed in democracy. That is: "democracy of the elite". That is: democracy in the politburo. That is: democracy among a group of a handful of people. And no more.

By Lenin's definition the all those Democratic People's Republics really were democratic.

What that means for OpenAI, I don't know :/

tyre · on Aug 19, 2021

This is how America was conceived as well. Democracy was not meant to be universal and there were checks placed even on the limited portion of society who could vote.

The electoral college, for example, is set up this way. The idea is that electors—who were not democratically elected—could block a populist candidate.

As we saw in 2016, this check failed.

cryptonector · on Aug 19, 2021

Not the same. Lenin believed in a democracy of not more than 8 or so people. The bolsheviks won the Soviet elections and Lenin still staged a coup.

The American Constitution is an indirect democracy. Direct at the local level, indirect at the Federal level (though eventually by statute and amendment it became direct for Congress).

> As we saw in 2016, this check failed.

Nonsense. That "failure mode" was designed for. It has happened a few times. Working as expected.

drdeca · on Aug 19, 2021

Huh? Are electors not, y’know, technically who one is voting between when one votes for president?

If you mean “what electors are on the ballot” isn’t democratically decided, but rather, is determined by the parties, then yeah I guess that’s true?

dotcommand · on Aug 19, 2021

> By Lenin's definition the all those Democratic People's Republics really were democratic.

Not by lenin's definition. Just the definition of democracy and where it came from. We get democracy comes from the greeks - where only the wealthy slave owning elites could vote.

The founders also followed that model. Only wealthy landowning whites could vote. The first few elections only like 4 or 5% of the population voted.

Isn't it crazy how propaganda has shifted your understanding of democracy? Democracy was never "of the people, by the people, for the people ", it was always about the elite few.

cryptonector · on Aug 19, 2021

> The founders also followed that model. Only wealthy landowning whites could vote. The first few elections only like 4 or 5% of the population voted.

The Founders did not write that into the Constitution.

> Isn't it crazy how propaganda has shifted your understanding of democracy? Democracy was never "of the people, by the people, for the people ", it was always about the elite few.

Propaganda is what you're writing.

dotcommand · on Aug 19, 2021

> The Founders did not write that into the Constitution.

Because the founders wanted voting to be controlled at the state level, not the federal level...

> Propaganda is what you're writing.

Historical facts is propaganda?

So what is propaganda? That only landowning whites could vote? Democracy came from the ancient greeks? That ancient greeks owned slaves and only allowed wealthy slave owners to vote?

lostdog · on Aug 19, 2021

I think OpenAI tried to be Open, but then ran into two problems:

First, turns out that neural nets get better when you throw more compute at them. OpenAI was full of researchers looking to push the state of the art, and the state of the art became less and less accessible to the average person, who could not afford the supercomputers necessary to train GPT-3. "Democratizing deep learning" became less important as a goal, since it conflicted with the true priority internally: improving the state of the art in deep learning.

Second, it looks like they lost the interest of their initial funders. The execs were left with a big money hole in their budget, and had to go looking for some way to fill it. Bingo bango bongo, and now they are a for-profit looking for income streams.

I don't feel critical of them. It's very hard to do something both altruistic and expensive. Money doesn't just flow to those looking to do good in the world.

cornel_io · on Aug 19, 2021

> Not only are they not releasing Codex (and GPT-3), but in order to get access to the API you have to apply for access and be judged against a proprietary set of criteria that are entirely opaque.

I have yet to hear of one person who has gotten access without either a) being Twitter-notable in the ML space, or b) using a personal connection to jump the queue (I hit up someone a couple steps removed from OpenAI and got lucky). As far as I can tell they are just collecting email addresses to gauge interest, and are not even evaluating people who cold-apply through their form.

Please correct me if I'm wrong, though, I only know what I've heard within my own network! It's totally possible they're allowing a very very slow trickle of external unconnected people in.

zhynn · on Aug 19, 2021

I got access and I am a nobody. I registered a long time ago and it took maybe 8 months to get in.

iurysza · on Aug 19, 2021

You can auto generate believable fake news with it. This will be a tricky one solve.

criticaltinker · on Aug 19, 2021

Yup it's only going to get worse - at least for now, it's difficult for these models to generate long news articles that are coherent.

> mean human accuracy at detecting articles that were produced by the 175B parameter model was barely above chance at ∼52% [...] Human abilities to detect model generated text appear to decrease as model size increases [...] This is true despite the fact that participants spend more time on each output as model size increases [1]

> for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles [1]

[1] https://arxiv.org/pdf/2005.14165.pdf

kneel · on Aug 19, 2021

Not just news, you can create bot armies that push narratives and influence online discussions.

reidjs · on Aug 19, 2021

Solution: stop reading the news. You may as well start now.

PaulHoule · on Aug 19, 2021

It's outright depressing that we have to fight back example by example to show that Codex and similar scams are nowhere near able to imitate the ability of a programmer to code something correctly.

It's degrading of the work that we do for one thing.

pantsforbirds · on Aug 19, 2021

I mean it automates a lot of the work i don't want to do. Being able to write a unit test just by describing it is fantastic.

treme · on Aug 19, 2021

you are in denial if you are doubting effectiveness of Codex. Keep in mind that this was just beta.

If you saw how GPT-2 improved to GPT-3 in a year, it's easy to see where this is gonna go over next few iterations.

It's a 2007 Iphone level catalyst that's going to dramatically shift the landscape

PaulHoule · on Aug 19, 2021

And GPT-4 will almost work when it is powered by a Dyson sphere and GPT-5 would require more text to train than exists in all the planets of the universe.

All of those things are structurally inadequate to solve the problem in front of them and just make up for it for the same reason ELIZA seemed intelligent... People are willing to believe.

You could put fantastically more resources into that approach and find you're approaching an asymptote. It's the deadliest trap in advanced technology development and it happens when you ignore the first law of cybernetics.

Anyone who's been a practitioner in the software field has experienced that repairing mistakes from a program written by somebody clueless is almost always vastly more expensive than writing it correctly to begin with -- I remember helping a friend "cheat" at CS 101 by stealing somebody else's homework, finding the program was wrong and putting a lot of effort into debugging it and fixing it, never mind changing the symbol names and taking other measures to hide the origin of the program.

It might be my karma, but fixing the program I stole turned out to be excellent preparation for a career in software dev.

scelerat · on Aug 19, 2021

And yet today, the descendants of ELIZA provide first-line customer support for thousands of companies.

twic · on Aug 19, 2021

Extremely poorly. Companies use them because they're cheap, not because they're good.

qualudeheart · on Aug 19, 2021

Where are you getting these numbers for GPT-4/5 resource requirements from? I’ve heard of diminishing returns happening though not to the degree you’re describing.

PaulHoule · on Aug 19, 2021

The numbers don't matter.

The fact that it will converge towards an asymptote is qualitative, not quantitative because the structure of the system is wrong for the problem.

If the people who were doing this research were physicists you could excuse them, but they are computer scientists and should know better.

It's a basic problem that GPT-x allocates the same amount of resources despite the complexity of the problem. Some problems in programming and language understanding have a SAT/SMT-like structure (e.g. for each pronoun solve for what the pronoun refers to, figure out what the subject/objects of all the verbs are, etc.) which scales in a certain way. Other problems in "understanding" require necessarily that you try one interpretation, work through the consequences of that interpretation until you find a dead end, back up to the place where you failed, try something else, test it until done.

Some of these problems that you might want to "solve" with GPT-x might really be problems that can't be solved in a finite amount of time.

Something like GPT-x with some extra structural features might be able to rise to some of these challenges but we are not hearing about that, we are hearing that adding more nodes and more training data and training for longer is the way to salvation.

(So far as "diminishing returns" matter the question is "does the system ever get good enough that it can be let off the leash?" and my answer is no.)

MauranKilom · on Aug 19, 2021

Well, we are already hitting pretty strong limits in terms of corpus size.

For reference, GPT-3 has 175 billion parameters. That's a few TB (uncompressed).

GPT-3 was trained on 45 TB of text. Most of it comes from 8 years of web crawling, and a significant chunk of the remainder comes from online book corpora [0]. At some point you just run out of human-written text to train on.

You can't just keep making the model bigger (by orders of magnitude I mean) without scaling the input data the same. And at some point we just run out of the latter.

[0]: https://in.springboard.com/blog/openai-gpt-3/