Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is awesome, can't wait to get api access to the 32k token model. Rather than this approach of just converting the whole repo to a text file, what I'm thinking is, you can let the model decide the most relevant files.

The initial prompt would be, "person wants to do x, here are the file list of this repo: ...., give me a list of files that you'd want to edit, create or delete" -> take the list, try to fit the contents of them into 32k tokens and re-prompt with "user is trying to achieve x, here's the most relevant files with their contents:..., give me a git commit in the style of git patch/diff output". From playing around with it today, I think this approach would work rather well and can be like a huge step up from AI line autocompletion.



Please see the following repos for tools in this area:

https://github.com/jerryjliu/llama_index

and/or

https://github.com/hwchase17/langchain


Maybe someone can correct me, but my understanding is that you would calculate the embeddings of code chunks, and the embedding of the prompt, and take those chunks that are most similar to the embedding of the prompt as context.

Edit: This, btw, is also the reason why I think that this here popped up on the hackernews frontpage a short while ago: https://github.com/pgvector/pgvector


This sounds like a reasonable start. Eventually we need to get to the point where we can expose an API for models to request additional information on their own.


Exactly, but this has some scary implications in the future - imagine when it is common pratice to allow AIs API access as a matter of course...

When giving a prompt, the prompt causes the crawling of many APIs to build the response - the power of such activity/features, will be scary power-to-authoratarian goals.

Imagine if the prompt is "Select all users who have political beliefs, posts, comments, links from APIs A, B, C, etc where sentiment appears to dissent from [party line]"


Imagine that these prompts are not triggered by humans anymore, but by the AI invoking itself.

Sam Altman takes comfort in the thought that their AI does nothing without a human prompting it, so it has a human in the loop, as a circuit breaker if you will.

This assumption is rapidly becoming a mere hope, as right now probably hundreds of developers are working on systems which, when put into production and connected to other systems, might just come down to: the AI is calling itself, and giving itself orders.


Imagine that you just out-did me on how dystopian this AI cyberpunk future could get.

Imagine I want you to shoot me.'

Imagine I Want [whatever]GPT to make a cyberpunk Anime based on such...


It's just so slow for the autocompletion use case to do it like that. Ideally, you're never chaining serial requests to the LLM. Even if you do stuff in all the data into a single prompt, the execution time seems to be superlinear with the number of tokens, again getting super slow.


Yeah I agree it's too slow for autocompletion at the moment, but this would be for full feature implementations, not just autocomplete. For example, if I have a repo I want to add a table and rest api implementation in, it can do this: https://imgur.com/a/mIJvaJr (ignore the formatting errors in the UI, somehow parts of it show up in as code and others not, but api wouldn't have this issue, especially since you can use the system message to enforce output format).

I'm happy to wait even 30-60 seconds for this which I can easily evaluate, criticize (and the model will correct it) and then proceed to just patch and move on. I think the results from this will be much better with the 32k model, but remains to be seen.


Working with GPT becomes like coding in plain English.


There's a reason we don't code in plain English though. Natural language has ambiguities. This is the reason we invented programming languages.

It's best illustrated by the old joke:

  A programmer's wife told him "Go to the store and buy milk and if they have eggs, get a dozen." He came back a while later with 12 cartons of milk.
A good chunk of all bugs in software are down to the requirements being insufficiently well specified. Further, many bugs are the discovery of new requirements when informal specification encounters reality.

"Read from standard input into this byte array" doesn't specify what to do when the input exceeds the byte array.

When you overflow the buffer, you get a "well obviously you're supposed to not do that"... that's wasn't stated at all.

When the function keeps going after a newline or a null byte or whatever, there's another "well obviously you're supposed to stop at those points". That was also not specified.

and so on.

At the point you're specifying all these cases and what to do when, it's so specific and stilted, you might as well be using a programming language.


Actually, programming languages were invented because speaking machine code was too much of a pain in the ass! Programming in English is a natural next step. Ambiguity is not an issue -- you keep speaking until its resolved.

(We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)


This type of well-defined language actually predates computers by several thousand years. Even way back in antiquity they used "programming languages" like these to get around the inherent ambiguities of natural language.

Originally as formulaic syllogisms and Aristotelian logic, but then onto other forms of codified language, formal logic etc.

Adding more words often makes things less clear, not more so. What you need is well-defined terms with no overloaded meaning.

> (We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)

Humans get it wrong all the time though. A great many bugs arise from quite simply misinterpreting the requirements. Which leads to requirements becoming more formulaic and resembling a programming language.


I disagree that logic or math are the same as a programming language; a programming language is defined by the fact that a machine can execute it.

Plus also most math and logic is still communicated and developed in a mix of human languages (English etc) and ad-hoc, not rigoursly defined notation; it's nowhere near the precision of a programming language.

Though you CAN of course grind it out at that level, if you want, but it's very unwieldy and not how people actually work.

If you're looking to deduce some sort of proof from well-defined principles and you wish to eliminate the possibility of error, then sure well-defined terms (a rigorous language) is useful.

If you're looking to produce a sofwtare artifact, just saying what you want in high-level terms and providing iterative natural language feedback is going to work great and be way nicer than trying to formalize everything.

(Maybe not true for low-level plumbing and things that need to be secure. But for like "build an app", "build a game", "make a shell script that does x", I think it will certainly end up being true.)


Well what you want is one and only one behavior. You're going to need to be specific to the point where what you're specifying is that singular behavior. An interesting example is a binary search, which is easy to informally specify, although it's shockingly difficult to produce a correct implementation. [1]

Given what you want is a singular specific behavior, not just sorta that behavior, but exactly that behavior, it makes sense to use a language that caters to specificity.

As an aside, we've experimented with more natural-language like programming languages before (COBOL, BASIC). They're a pain in the ass because you needed to type a novel to do even trivial things. The terseness of modern programming languages is a feature.

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


You want one behavior from the perspective of the user's desire, but there will be a very large set of possible programs that will achieve that desire. So you just need to be precise w/r/t desire, not with the nitty gritty behavior. Traditional programming languages are great to expressing the nitty gritty behavior, and bad at capturing the desire.

Don't think of LLMs as akin to COBOL or Basic (or HyperTalk, is another great example) -- those are essentially rigid languages with a sprinkling of "Englishness" to try to make them seem less foreign to novice users. And yes, that just gets in the way once you become proficient in rigid languages.

But that's a completely different beast than programming in actual English.

So for example, the reason binary search (a simple algorithm) is hard to implement is because of the mismatch between easy to understand human desire and the pickiness of the rigid language designed for the machine.

Now, it is very easy to implement a correct binary search.

Tell ChatGPT:

> Write a binary search method in C#

It responds:

    public static int Search(int[] arr, int target)
    {
        int left = 0;
        int right = arr.Length - 1;

        while (left <= right)
        {
            int mid = left + (right - left) / 2;

            if (arr[mid] == target)
            {
                return mid;
            }
            else if (arr[mid] < target)
            {
                left = mid + 1;
            }
            else
            {
                right = mid - 1;
            }
        }

        return -1; // Return -1 if the target is not found
    }
We can also ask it:

> Prove it is correct

It responds:

> To prove the correctness of the binary search algorithm, we can use a technique called loop invariant. A loop invariant is a condition that remains true for every iteration of the loop. In this case, the loop invariant is that the target element, if it exists in the array, will always be within the range defined by the left and right pointers.

> Loop Invariant: At the start of every iteration, if the target element exists in the array, it must be within the subarray arr[left, right].

> Initialization: At the beginning, left = 0 and right = arr.Length - 1, which covers the entire array. So, if the target element is present in the array, it must be within the initial range.

> Maintenance: During each iteration, the algorithm examines the middle element (mid = left + (right - left) / 2). There are three cases:

> If arr[mid] == target, the target element is found and returned. > If arr[mid] < target, the target must be in the right half of the subarray. Therefore, we update left = mid + 1. Since the array is sorted, we are sure the target is in the range arr[mid + 1, right] if it exists in the array. > If arr[mid] > target, the target must be in the left half of the subarray. Therefore, we update right = mid - 1. Since the array is sorted, we are sure the target is in the range arr[left, mid - 1] if it exists in the array. > In cases 2 and 3, we have updated either the left or right pointer, maintaining the loop invariant for the next iteration.

> Termination: The loop terminates when left > right. Due to the loop invariant, if the target element exists in the array, it must be in the range arr[left, right]. Since left > right, this range is now empty, which means the target element is not in the array, so the algorithm returns -1.

> In conclusion, the binary search algorithm is correct, as it either returns the index of the target element if it exists in the sorted array, or -1 if it doesn't.

(OK, I'd quibble with that proof a little bit -- it was a little hand-wavy on why the loop must terminate.)

How about the broken binary search you linked to?

Ask ChatGPT:

> Can you find the bug in this java binary search implementation? [Code]

It responds:

> The given binary search implementation appears to be correct, assuming that the input array a is sorted in ascending order. However, there is a potential issue with integer overflow when calculating the middle index mid. To avoid this, you can calculate mid using the following line:

    int mid = low + ((high - low) / 2);
So maybe now you can see how English might be a superior way to program.


Right, I brought it up as an instance of a class of problems that has this property of being easy to specify but difficult to implement correctly. It will know how to implement a binary search because a great deal of articles have been written about the correct way of implementing a binary search and the pitfalls of this one particular problem is very well documented.

It's almost unique in that the problem has a corpus of literature about how difficult they are to implement correctly, which pitfalls are common, and how to solve them. ChatGPT being able to regurgitate this solution is not a good demonstration of it's ability to solve general programming problems.


That's my general point: it's easier to say what you want than to implement it in a low-level (relative to English) language. Hence why English is a good programming language.

And LLMs aren't just good at binary search, they're good at lots of things.

Imagine you are in a room with a programmer who is unquestionably better and more expert than you are.

Now let's say you need to write a program. Would you be better off trying to write it yourself, or describing what you want to the better programmer, and letting them write it?

Obviously the latter!

Given a sufficiently advanced compatriot, English is the preferred programming language.

Now, are LLMs good enough? Probably not yet, but getting there rapidly!


And we are back with Cobol haha full circle


You would probably should also include all relevant imports. So in C/C++ add all non-standard headers referenced by those files, in other languages simulate the import system, maybe pruning imported files to just the important parts (type definitions etc)


AFAIK, only the older models allows you to do fine tuning, not sure GPT4 will allow to create your own fine tuned model so basically with the API it will work the same as with the chat gui.


Just remember the API charge is 6c per input token [1]. If you push 32k input tokens in, you're looking at $2000 per API call just as input.

You... might wanna consider a self hosted alternative for that use case, or at least do like, a `| wc` to get an idea of what you're potentially sending before calling the api.

[1] - https://help.openai.com/en/articles/7127956-how-much-does-gp...


6c per thousand tokens, so $2 per maxed out API call


oh, you're quite right, I didn't see that it was per /1k tokens. My bad!


Still super expensive. You cant build a business arround it with these rates. It should be at least 1000 times cheaper if not more.


Depends how good it is now, doesn’t it? If it actually writes code (it does), which is ‘good enough’ compared to a 25-150$/hr human (it does for the lower part of that scale), the $2 is definitely a good business case. For instance, I had to write 1500 lines of js yesterday because one of my colleagues who took the task could not manage it in the previous 2 days and the deadline was today. We would’ve saved about 500$ in total doing it with that expensive API from the start. Now it was free besides my hourly wage (it took me little over an hour) so it made even more sense.


> which is ‘good enough’ compared to a 25-150$/hr human (it does for the lower part of that scale),

I'm not exactly sure you define that scale, perhaps Minecraft bots or the like, where the damage after complete failure is self contained to perhaps a few dollars or a few hours of human annoyance. I'm sure there are many niches where a 50% success rate of mass generated programs can earn you big bucks.

But in my experience Codex does very limited reasoning about code paths. For the current state of the art, you are almost guaranteed to have catastrophic bugs in any non-trivial programs engineered by prompt.


When a bad senior or junior/medior delivers their pr, I (or some other skilled senior) review it and it has issues which I explain and we go in a fix loop. Often I just approve it and fix it myself as it takes too long. That is good enough as that’s going on in all companies. Gpt is the same only many times faster; the loop is instant and sometimes I give up and fix it myself, as it is not going to get it, just like some (depressingly many) humans.


Have you worked with ChatGPT to generate the edge cases/test data? then fed that in to get back out a function or what ever?

I tried it with phone number formats and it came up with more than I could.


Stupid question: "How could all the crypto mining infra be reporpused to be GPU GPT prompting farms?"


depends on if you can build something that can classify what api to call.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: