I find it surprising that many developers find using ai prompts easier than an actual programming language (not the first time I hear about it, but now it seems to be serious).
When I tried to “talk” to gpts, it was always hard to formulate my requests. Basically, I want to code. To tell something structured, rigorous, not vague and blah-blah-ey. To check and understand every step*. Idk why I feel like this, maybe for me the next ten years will be rougher than planned. Maybe because I’m not a manager type at all? Trust issues? I wonder if someone else feels the same.
Otoh I understand the appeal partially. It roots in the fact that our editors and “IDEs” suck at helping you to navigate the knowledge quickly. Some do it to a great extent, but it’s not universal. Maybe a new editor (plugin) will appear that will look at what you’re doing in code and show the contextual knowledge, relevant locations, article summaries, etc in another pane. Like a buddy who sits nearby with a beer, sees what you do and is seasoned enough to drop a comment or a story about what’s on your screen any time.
Programming in a natural language and trusting the results I just can’t stand. Hell, in practice we often can’t negotiate or align ourselves in basic things even.
I used to think like you until I actually tried copilot. I don’t even write much boilerplate code (it’s python, and the framework is already setup). It is still an effective and useful tool. It saves mental bandwidth, and the code it writes is exactly what I was intending to write, so it turns me into a reviewer than a coder and thus makes it easier to get the job done (I’m still confirming the code does what I want).
This is just gpt-3. With chatGPT i finally managed to get unit test backbones for the most annoying methods in our code base that stuck our coverage at 92%. Once we get full copilot gpt-4 integration we will likely get to 98% coverage in a days time. That’s not nothing.
I agree it's a game changer. I also think once I get access to copilot x I'll find it irresistible.
Something about the iterative context aware approach of chat helps me think through the problem. The only think I don't like about copilot is how you cant tell it to augment the output. Sometimes it's close but not perfect.
I think a lot of people turn off because they test it out on toy projects. IME it becomes a lot more valuable once there's enough context for it to start building off of, rather than just helping you lay early boilerplate.
It depends on the code. If the code is difficult to write, because you need to know a lot of details of other parts, Copilot will struggle.
But you can actually think of this as a code quality measure. You want to avoid this. You want to organize the project in a way that writing or modifying the code is easy. That means, make it straightforward how functions are called, how things are done. Minimize interdependencies, side effects.
Everyone projects their difficulties on to you, so here's mine:
A specification can be rigorous, structured, precise and formal: a declarative description of what is to be done - not how. It is a very different perspective from designing algorithms - even one that proves the thing to be done, is done.
But I think trusting the results is misplaced. It's more like being a partner in a law firm, where the articled clerks do all the work - but you are responsible. So you'd better check the work. And doing that efficiently is an additional skill set.
Totally. All I want is an easier time navigating code; not getting lost in tabs and page scrolling, nested class structures and function references, interrupted by context-switching text searches. It feels like I spend 75% of my brain capacity managing my short term memory. Your buddy mentor is a great formulation of the kind of AI aid I need. As for generating code, that's the least of my concerns. I'm actually happy when I get the opportunity to just hammer down some brain dead lines for a change.
Code navigations can be improved in a plenty of ways.
I use JetBrains' IDEs daily, but most points can be applied to other IDEs/editors.
Some things that helped me to become a lot more efficient:
- Vim. Since you mostly edit code, not write it, it is reasonable to be in the editing mode by default, with a full keyboard of shortcuts you can use, not only ctrl/alt combinations. I use IdeaVim[1] plugin for Vim emulation.
- Never use your mouse. Keyboard actions are faster and more consistent once you get used to it. IDEs and editors are highly customizable and should allow to bind almost any action.
- Get rid of tabs, keep 1-3 active buffers in your vicinity. And then use ctrl+tab or ctrl+e to switch between them.
- Learn tricks for code navigation: jump to definition/usage/implementation, search file/class/symbol.
- Learn tricks for text navigation: pretty much any decent tutorial on Vim navigation will do.
We're in same page. Me too constantly jump over code sections.
I realized in many solutions to this, most effective could be meta thinking.
Self awareness. I'm quick - like most of us - on editor, I'm quick at coding
why ? It is kind of conditioning with time management. Quick thinking - assume - make us jumps, just like a battle with many fronts, keep your eye on that, then that . In reality, the code part we wrote in an one hour, probably done in one spark of brain act. Rest is coordination.
A large part of software development is documentation. It is often overlooked or not kept up to date. I think the great advancement of AI is that we can now more closely link and validate one with the other. You can easily summarize some code with chatGPT, and also provide some structure of code based on documentation (as an outline, or first cut).
However, this is the state of the art today. In the future, the training set will be based on the prompt-result output and refinement process, leading me to believe that next generation tools will be much better at prompting the user to provide the details. I've already seen this enhancement in gpt4 recently, I think this is a common and interesting use case.
Overall, these tools will become more and more advanced and useful for developers. Now is a great time to become proficient and start learning how the future will require you to adapt your workflow in order to best leverage chatGPT for development.
As an infra dude it saves me loads of time. Most of the code I write is stuff where I already know what needs to happen, I already know how the process looks un-automated, and I'm not going to learn a whole lot writing more boilerplate, or learning the names of all the endpoints for a dumb api and parsing the text.
It's still mental work. You still have to read the code over and edit stuff.
It helps me save lots of time so I can do quality things like comment on hn during work..
Infra as code is really where copilot shines. GPT/codex isn’t great as a state machine, but it’s great for solving the “what’s that variable called?” or “how’s that yaml supposed to be structured?” problems. I think the architect and infra eng distinction will disappear soon as writing IAC becomes nearly instantaneous
GPT seems really smart and you can give it very high-level prompts, which generate good quality code. But it's limited by the chat interface, and if it doesn't get it right the first time, it's tricky to get a tweaked version. For example, I asked it to generate some code for a login endpoint, and gave it all the details about language, libraries, database etc. It produced some pretty good code. But then I asked it to do a version didn't store passwords in plaintext. It did that, but also made other changes including using a different library than one I had specified. So then I had to try to fix that, which led to more weirdness. I ended up just using the code it generated as inspiration and writing it myself.
Copilot, with its integration into VSCode is quite different. It's *awesome*. It makes suggestions in inline, as you type. It seems to have a lot of context, and generates very good code, apparently based on information contained in other files in the same project. It matches style and naming conventions, and even when it's not quite right, I find it easy to just accept the suggestion, and tweak it. It's a huge help.
So my current practice is to rely on Copilot to generate code and use GPT as sort of a pairing partner that I can talk to about the code. It's pretty good at suggesting alternatives, developing ideas, explaining error messages and so on. This is a very, very early attempt at coding with the help of AI, and I haven't actually seen a productivity improvement, because I don't know how to use these tools well. But it's fun and exciting!
I feel like the real value is when working with tools you're not proficient with.
I.e. for a backend dev that hardly knows react and needs to write some, who usually gets by doing small modifications and copy-paste-driven development there, ChatGPT / Copilot is basically a 10x productivity multiplier.
I agree that I've not been feeling that much value yet when working in areas I'm proficient in.
> I find it surprising that many developers find using ai prompts easier than an actual programming language (not the first time I hear about it, but now it seems to be serious).
I, too, was skeptical until...
I've making the switch to VS Code (not by choice...my previous daily driver, Atom, has been EOLed since December). Accordingly, I needed to port an Atom extension I wrote a long time ago to VS Code. What I didn't want to do was spend hours or days getting up to speed on how to write VS Code extensions, especially since this is the only one I'm ever likely to write (it's the only one I ever wrote for Atom :-)).
I used GPT-4 and it worked like a charm -- almost like magic. It wrote almost all the boilerplate and bullshittery for me. Was it perfect? No, the regex it came up with for recognizing the code pattern to which the extension applies (Scheme atoms or pairs) was crap. That's okay -- I know how to write regexes (in fact, I pretty much just dropped in my state machine parser from the old extension). But not having to spend hours and hours groveling through Microsoft docs and Googling to learn stuff I'm not likely to ever need again? Man, that was great.
> Programming in a natural language and trusting the results I just can’t stand.
This would worry me, too. But in this case, I was already well-familiar with Javascript and Typescript, so it was no problem for me to read the resulting code and give it a sanity check, test it, etc. I just didn't have to write it.
I'm not super impressed yet either, but I do think there's a lot of value. I tried a few times to get it to write some simple python scripts, and usually it'd get some things wrong. But it did help to get the basics of the script started.
If you haven't already, look into trying out Copilot (I don't think there's a free version anymore?) or Codeium (free and has extensions for a lot of editors, just be careful with what code you're okay sending to a third party). Using AI prompts as a fancy auto complete is what has been giving me the most benefit. Something simple like a comment "# Test the empty_bank_account method" in an existing file using pytest, I go to the next line and then hit tab to auto-complete a 10-15 line unit test. It's not always right, but it definitely helps speed things up compared to either typing it all out or copy/pasting.
My biggest annoyance so far, at least with Codeium, is that sometimes it guesses wrong. With LSP-based auto-complete, it only gives me options for real existing functions/variables/etc. But Codeium/copilot like to "guess" what exists and can get it wrong.
> Like a buddy who sits nearby with a beer, sees what you do and is seasoned enough to drop a comment or a story about what’s on your screen any time.
I agree with you, this is probably where it would be most useful (to me). I don't need someone to write the code for me, but I'd love an automated pair-programming buddy that doesn't require human interaction.
I think it has to do with how invested you are in the appearance of the final code. If you're very focussed on getting a result and no more, then GPT is probably very powerful. But if you also care about the "how" of the implementation then you'll probably get frustrated micromanaging GPT. I know I do, but I do find its pretty useful when my memory of how to achieve something is blurry. I know what looks wrong but I don't remember how to make it right.
I've been leaning on ChatGPT about 1/4 of the time for programming personal projects. It's a great way to get boilerplate out of the way, as well as quickly introduce me to things I didn't know about. Once I get started, though, I stopped using the AI. I'd use it for code reviews, but it isn't smart enough to thoughtfully make comments about how data is pipelined through a complete application (it lacks in other ways, too, but that's the big thing I noticed).
> it isn't smart enough to thoughtfully make comments about how data is pipelined through a complete application
My take is that data-centric programming requires too much context for GPT, and we're going to see a move back to doing things in a more OO way, hopefully with better languages than exist now. The ability to reason locally about objects helps both human and AI developers, so we can build larger systems. Data-centric/functional programs are akin to hand-crafted, artisanal goods that were slowly overtaken by standardized parts/division of labor production methods in the industrial revolution, and we are in the middle of one for software [0]. By the ended of it, software engineering may no longer be an oxymoron.
The reason I mentioned data pipelines is I've adopted a compiler type view of programming, where data is mutated through series and or parallel steps until a satisfactory side-effect or output is reached. I find its a lot easier to define a problem when I think about it this way.
This looks quite interesting. So I have question for these AI powered editors: what advantage would a dedicated editor like this have over just using an AI plugin for VsCode? How do you fundamentally build the editor differently if you are thinking about AI from the ground up?
Our editor isn't a regular coding editor. You don't actually write code with e2b. You write technical specs and then collaborate with an AI agent. Imagine it more like having a virtual developer at your disposal 24/7.
It's built on a completely new paradigms enabled by LLMs. This enables to unlock a lot of new use cases and features but at the same time there's a lot that needs to be built.
Writing technical specs a fancy way to say coding. This reads to me like you're writing a new programming language tightly integrated with an IDE that targets a new AI based compiler.
Yes, as always the essential complexity of software is understanding and describing the problem you are trying to solve. Once that is done well, the code falls out fairly easily.
That’s like saying a painting, once they understand what they are trying to paint, just falls out of a painter. It is true but not useful for not-painters.
I think the difference here is that code is effectively a description, so there is an extremely tight coupling between describing the task and the task itself.
You could tell me, in the most painstaking detail, what you want me to paint, and I still couldn't paint it. You can take any random person on the street and tell them exactly what to type and they'd be able to "program".
That's just picking nits with the metaphor. Change it to a poet or a novelist and it works the same. If you tell a person exactly what to write they are just a fancy typewriter, not a poet or novelist. Same with code.
Hmm... Where is the new language in this? The specs is just human language and some JSON for defining structures. It's more so that the human language becoming a programming language with the help of AI.
And over time, people will discover some basic phrases and keywords they can use to get certain results from the AI, and find out what sentence structures work best for getting a desired outcome. These will become standard practices and get passed along to other prompt engineers. And eventually they will give the standard a name, so they don’t have to explain it every time. Maybe even version schemes, so humans can collaborate amongst themselves more effectively. And then some entrepreneurs will sell you courses and boot camps on how to speak to AI effectively. And jobs will start asking for applicants to have knowledge of this skill set, with years of experience.
Until one day a new LLM gets released, GPT5, that doesn't recognize any of those special words. Mastering prompt-speak is essentially mastering undefined behaviors of C compilers.
gpt4 won't know anything about gpt5, you would have to make a sophisticated prompt for gpt4 that converts its quirks into gpt5's quirks, but if you know so much about both LLMs, why not to use gpt5 directly?
The idea is someone would first make a prompt for GPT4 that outputs GPT5 enabled prompts. You would initialize GPT4 with it, and then speak to GPT4 to compile prompts to GPT5 context which then gets fed to GPT5.
Although you may know about LLMs, you might specialize in speaking to specific models and know how to get optimal results based on their nuances.
>And over time, people will discover some basic phrases and keywords they can use to get certain results from the AI, and find out what sentence structures work best for getting a desired outcome.
This just sounds like a language that is hard to learn, undocumented, and hard to debug
I'm not sure I follow this answer. What are the entirely new paradigms? Writing is still the initial step. If text editing remains a core part of the workflow, why restrict the user's ability to edit code?
> You don't actually write code with e2b. You write technical specs and then collaborate with an AI agent.
If I want to change 1 character of a generated source file, can I just go do that or will I have to figure out how to prompt the change in natural language?
> How do you fundamentally build the editor differently if you are thinking about AI from the ground up?
Great question. I would love to hear the devs thoughts here. This is one of those questions where my intuition tells me there may be a really great "first principles" type of answer, but I don't know it myself.
If you could use it without submitting data to some ai company, or if it came with a non-disgusting terms of service, that would be a killer feature for me.
For example, the last ai company installer I just clicked "decline" to (a few minutes ago) says that you give it permission to download malware, including viruses and trojans, onto your computer and that you agree to pay the company if you infect other people and tarnish their reputation because of it. Literally. It was a very popular service too. I didn't even get to the IP section
edit: those terms aren't on their website, so I can't link to them. They are hidden in that tiny, impossible to read box during setup for the desktop installer
I built this https://github.com/campbel/aieditor to test the idea of programming directly with the AI in control. Long story short, VS Code plugin is better IMO.
In essence when working with code stops being the major thing you do (you abstract that away) and start managing the agents working on code and writing the spec, you need new tools and working environment to support that.
> and start managing the agents working on code ... you need new tools
Jira?
Only slightly joking. It really sounds like we're moving in the direction of engineers being a more precise technical version of a PM, but then engineers could just learn to speak business and we don't need PMs.
Hey everyone, I quite frankly really didn't expect our project getting on HN front page in 2 minutes after posting. I'm one of the creators of this project.
It's pretty nascent and a lot of work needs to be done so bear with us please. I'm traveling in 30 minutes but I'm happy to answer your questions.
A little about my co-founder and myself:
We've been building devtools for years now. We're really passionate about the field. The most recent thing we built is https://usedevbook.com. The goal was to build a simple framework/UI for companies to demo their APIs. Something like https://gradio.app but for API companies. It's been used for example by Prisma - we helped them build their playground with it - https://playground.prisma.io/
Our new project - e2b - is using some of the technology we built for Devbook in the past. Specifically the secure sandbox environments where the AI agents run are our custom Firecracker VMs that we run on Nomad.
If you want to follow the progress you can do follow the repo on GH or you can follow my co-founder, me, and the project on Twitter:
(Off-topic) Is there an "Are we open source yet?"-type site[1] that follows the progress of the various open-source LLMs? This is the first I've heard of GPT4All. I'm finding it tough to keep up with all these projects!
Not what you're looking for but I built a page a while back to keep track of Stable Diffusion links using my little website-builder side-project protocodex.com - https://protocodex.com/ai-creation
You're welcome to use it if you want to get a link page started, and I'd be glad to help - you can also add comment sections on the page to get user input/contributions so if anyone else has some links they can comment them there.
I eventually want to more fully formalize user contributions to pages so that they can be used as crowdsourced freeform sites, if theres enough interest out there.
Sorry if it's a dumb question, since It's quite hard to keep up with all the recent developments in custom GPT/LLM solutions.
Do I understand correctly that the GPT4All provides a delta on top of some LLAMA model variant? If so, does one need to first obtain the original LLAMA model weights to be able to run all the subsequent derivations? Is there a _legal_ way to obtain it without being a published AI researcher? If not, I'm not sure that Gpt4All is viable when looking for legal solutions.
My biggest concern with tools like this is reproducibility and maintainability. How deterministically can we go from the 'source' (natural language prompts) to the 'target' (source code)? Assuming we can't reasonably rebuild from source alone, how can we maintain a link between the source and target so that refactoring can occur without breaking our public interface?
This is a valid concern and we are still experimenting with how to do this right.
A combination of preserving the reasoning history, having the generated code, and using tests to enforce the public interface (and fix it if anything breaks) looks promising.
I think the crucial part is indeed not being able to deterministically go from NL to code but to take an existing state of the codebase and spec and "continue the work".
Pretty simple it’s just like any abstraction. This AI will not work when nobody would deliver the answer beforehand. LLMs are given inputs of existing code. When you abstract that you better hope you have good code in it.
So my question would be, what is the use case?
I guess it’s more like planing software and not implementing it.
You can pretty well plan your software with ChatGPT. But it will just help you not really doing the job.
> mlejva: Our editor isn't a regular coding editor. You don't actually write code with e2b.
then what licensing problems arise from its use? In theory, if you only prompt the AI to write the software, is the software even your intellectual property?
It seems like this is a public domain software printing machine if you really aren’t meant to edit the output.
Good observation. I did notice that a lot of the types that jumped from framework to framework are now jumping onto ai. Please keep them there, at least that way js will stand a chance at becoming sane.
What do you think these fuzzy interpreters are gonna be used for? Machine code running on metal? It's gonna be all scripted servers and web apps for saas startups slurping that VC or UE money
Nothing yet - so far it’s just a basic electron app that you select what files to send as reference for the feature you want to add, then streamlines the process of applying edits ChatGPT sends back.
I’m not really planning on turning it into a product. It sounds like this guy is a lot farther along than me if you’re looking for a competitor - I think you’re going to have plenty.
https://mobile.twitter.com/codewandai
Is there examples / case studies of more complex apps being built by LLMs? I've seen some interesting examples but they were all small and simple examples. I'd love to see more case studies of how well these tools perform in more complex scenarios.
My gut feeling is we're still a few LLMs generations away from this being really usable but I'd love to hear how the authors are thinking about this.
Can you give an example of complex? I’ve used ChatGPT to help me build an app that authenticates a user using Oauth. That information creates a user in the backend (Rails). That user can then import issues tagged with specific information from a 3rd party task management tool (Linear). The title for these issues are then listed in the UI. From there, the user can create automatic release notes from those issues. They can provide a release version, description, tone, audience, etc.
All of that (issues list, version, tone, etc) is then formulated into a GPT prompt. The prompt is structured such that it returns written release notes. That note is then stored and the user can edit it using a rich text editor.
Once the first note is created the system can help the user write future notes by predicting release version, etc.
This isn’t that complex imo, but I’m curious to see if this is what people consider complex.
How about a 2 million line legacy app spanning 5 languages including one created by a guy who left the company 14 years ago which has a hand-rolled parser and is buggy.
A Line Of Business app. With questionable specs. Where inputs are cross dependant and need to be filtered. Some fields being foreign keys to other models.
> Line of business (AKA LOB) is a term that describes a business’s product or service, the resources used, and the process for delivering value to a market segment. It could be the primary or one of the main processes that bring revenue.
> For example, manufacturing dry-erase markers is a line of business. Everything that happens from concept, developing the markers, marketing, selling, to fulfillment, and staying competitive makes up the business line. So, a LOB could also describe a product line.
I don't have a specific definition of complex in mind. Seeing more examples of this with the prompts used + output and the overall steps is exactly what I'm asking for. I'm particularly interested in how the success rate changes as the code base evolves. Are LLMs effective in empty repos? are they effective on large repos? Can prompts be tweaked to work on larger repos?
Around 3 hours (not straight - I would hack on it for 30 minutes to an hour at a time). I spent another 1.5 hours or so styling it, but I did that outside of ChatGPT.
I've been expecting to be replaced by a much, much cheaper developer in another country since I graduated college ... three decades ago. I'm still not 100% certain why that hasn't happened.
I suspect it has to do with the equivalent of prompt engineering: it's too difficult to cross the cultural and linguistic barriers, as well as the barrier of space that could have mitigated the other two. By the time you've directed somebody to do the work with sufficient precision, you could have just done it yourself.
And it's part of the reason we keep looking for that 10x superdeveloper. Not just that they produce more code per dollar spent, but that there is less communication overhead. The number of meetings scale with the square of the number of people involved, so getting 5 people for the same price as me doesn't actually save you time or money.
I have no idea what that means for AI coding. Thus far it looks a lot like that overseas developer who really knows their stuff and speaks perfect English, but doesn't actually know the domain and can't learn it quickly. (Not because they aren't smart, but because of the various human factors I mentioned.)
I'd be thrilled to be completely wrong about that -- in part because I've been mentally prepared for it for so long. I hope that younger developers get a chance to spin that into a whole new career, hopefully of a kind I can't even imagine.
By the time you've directed somebody to do the work with sufficient precision, you could have just done it yourself.
And it’s much slower because “do it” includes trial-error-decision cycle, which is fast when you’re alone and weeks if you are directing and [mis]communicating. Also wondering where it goes and how big of a bubble it is/will be.
While you mention that you can bring your own model, prompt, etc, the current main use case seems to be integrating with OpenAI. How, if at all, do you plan to address the current shortcoming that the code generated by it often doesn't work at all without numerous revisions?
Good point and feedback, thank you. We'll update readme.
A lot of UX, UI, DX work related to LLMs is completely new. We ourselves have a lot of new realizations.
> While you mention that you can bring your own model, prompt, etc, the current main use case seems to be integrating with OpenAI
You're right. This is because we started with OpenAI and to be fair it's easiest to use from the DX point of view. We want to make it more open very soon. We probably need more user feedback to learn what would be the best way how this integration should look like.
> How, if at all, do you plan to address the current shortcoming that the code generated by it often doesn't work at all without numerous revisions?
The AI agents work in a self-repairing loop. For example they write code, get back type errors or a stack trace and then try to fix the code. This works pretty well and often a bigger problem is short context window.
We don't think this will replace developers, rather we need to figure out the right feedback loop between the developer and these AI agents. So we expect that developers most likely will edit some code.
I think there's a big gap here you might be missing. Most developers beyond juniors can generally write code that at least compiles on the first pass, even if it isn't functionally correct. Current AI models often generate code that doesn't even compile.
> Most developers beyond juniors can generally write code that at least compiles on the first pass, even if it isn't functionally correct.
Hahaha. I’ve been coding for over 20 years and this is definitely not the case.
> Current AI models often generate code that doesn't even compile.
Most of the code ChatGPT has given me, has run/compiled on the first try. And it’s been a lot longer and complex than what I would have written on a first pass.
Let’s just learn to use these tools instead of trying to justify human superiority.
I've found with my own projects that the AI allows me to focus on the interesting stuff. I was porting some quant work from a Jupyter notebook into a flask app, and the AI is quite handy at providing boilerplate for stuff that isn't relevant to what I want to work on. I need a site that can display charts for a given analysis of an indicator, and ChatGPT handled that for me quite handily.
> Most developers beyond juniors can generally write code that at least compiles on the first pass,
Aha. Maybe you know super clever people or people who learned in the 60-80s when cycles (and reboots etc) mattered or were costly; this is incredibly far from the norm now.
Is that a problem though? In this IDE the LLM sees the error message and tries to fix it, possibly while the developer who wrote the prompt is off doing something else.
I made one that tries to get to the end code by having 3.5 and 4 play different roles and correcting eachother. Sometimes it works, mostly it loops being unable to get to the end.
Based on my experiences having spent a bunch of time experimenting with LLMs for writing code, I think the hardest part that I haven't yet seen a solution to is modifying existing code bases.
Sure, if you're doing greenfield, just ask it to write a new file. If you only have a simple script, you can ask it to rewrite the whole file. The tricky bit however, is figuring out how to edit relevant parts of the codebase, keeping in mind that the context window of LLMs is very limited.
You basically want it to navigate to the right part of the codebase, then scope it to just part of a file that is relevant, and then let it rewrite just that small bit. I'm not saying it's impossible (maybe with proper indexing with embeddings, and splitting files up by i.e. functions you can make it work), but I think it's very non-trivial.
Anyway, good luck! I hope you'll share your learnings once you figure this out. I think the idea of putting LLMs into Firecracker VMs to contain them is a very cool approach.
I think this will get easier as the supported context sizes get larger. Of course the tooling needs to take care of most of the drudgery, but I'm not sure there's any underlying limitation of an LLM that makes refactoring existing code any different from creating new code. It's just a matter of feeding it the layout of the repo and the code itself in a way that the LLM can retrieve and focus on the relevant files when you ask it for help with a task.
Oh yeah, I agree with the refactoring bit. As I said, rewriting a file (section) works great if your file fits into the context window.
But context windows are far from being large enough to fit entire repos, nor even entire files (if they're big). I'm not sure how hard just scaling up the context window is, from the current early access of OpenAI GPT-4.
> in a way that the LLM can retrieve and focus on the relevant files
This I think is something we haven't really figured out yet, esp. if a feature requires working on multiple files. I wouldn't be surprised if approaches based on the semantic level (actually understanding the code and the relationships between it's parts; not the textual representation of it) won't be needed in the end here.
I agree retrieval will need to be aligned with semantic layout of the codebase. But that should be pretty straight forward, given the number of static analysis and refactoring tools we already have available to us and that we use daily as part of our IDE workflows.
This also implies that the first codebases to really benefit from LLM collaboration will be those written in strongly typed languages which are already amenable to static analysis.
And in terms of context windows, it's not like humans keep the entire codebase in their head at all times either. As a developer, when I'm focused on a single task, I'm only ever switching between a handful of files. And that's by design; we build our codebases using abstractions that are understandable to humans, given our limited context window, with its well-known limit of about seven simultaneous registers. So if anything, perhaps the risk of introducing an LLM to a codebase is that it could create abstractions that are more complicated than what a human would prefer to read and maintain.
This is actually possible, I made a proof of concept agent that does exactly this. The agent has access to several commands that get executed on demand - ListFiles, ReadFile and WriteFile. You give it a prompt - for example "add a heading to the homepage and make it yellow" and it will use ListFiles to locate the correct file, then use ReadFile to read it and then finally use WriteFile to write the new contents of that file.
It was just a PoC so it wasn't bulletproof and I only ever made it work on a small Next.js codebase (with every file being intentionally small so it fits the context window) but it did work correctly.
All these tools starts with grand proclamation of “open” and then the first thing you notice is the field to add your OPENAI_KEY. My humble suggestion is that if you are building something truly open please use some other models like LLAMA or BERT as default example and keep options for adding other models as needed.
Hey, thank you for the feedback. I understand that having a requirement for OpenAI key isn’t good for folks. The project is 4 weeks old and OpenAI is what we started with as a LLM. We want to create interface for people to bring and connect their own models though. We will do a better work of explaining this in our readme
I wouldn’t worry too much about it; there will be more and more models and fine-tuning services and fine-tuned downloads. Different models to mix and match.
> The current idea is to offer the base cloud version for free while having some features for individuals behind a subscription. We'll share more on pricing for companies and enterprises in the future.
What happens if you use the README.md and associated documentation as a prompt to re-implement this whole thing?
Maybe to spite Mircosoft? I would love to completely remove Microsoft from my workflow. The way telemetry creeps into everything bugs the hell out of me. I also really don't like the security model of vscode where it's assumed that any code you look at comes from a trusted source, and might execute code on your local system. That's a ridiculous security assumption for a text editor, but not surprising considering that these people also made MS Office.
I dislike telemetry as much as the next guy, but it is a little funny to complain about it on the post for this editor, which instead sends your entire codebase to OpenAI.
When I run VS Code on Mac and open some new non trusted directory it literally forces me to click “yes” in dialog that says something like “do you trust stuff in this directory to be able to run code?” Is this a Mac only thing and maybe you use windows or linux? Cuz I found your security concern addressed to an almost obnoxious extent…
I don't think the integration with vscode would work well here. We're ditching a lot of existing workflows for a completely new interaction between the developer and the agent. It's not like copilot at all. E2b is more like having a virtual developer at your disposal 24/7. You write a technical spec and e2b codes builds the app for you
This is awesome, scary and very interesting. But, for me, it comes with a personal concern:
For some time I've been giving serious thought about an automated web service generator. Given a data model and information about the data (relationships, intents, groupings, etc.) output a fully deployable service. From unit tests through container definitions, and everything I can think of in-between (docs, OpenAPI spec, log forwarder, etc.)
So far, while my investment hasn't been very large, I have to ask myself: "Is it worth it?"
Watching this AI code generation stuff closely, I've been telling myself the story that the AI-generated code is not "provable". A deterministic system (like I've been imagining) would be "provable". Bugs or other unintended consequences would be directly traceable to the code generator itself. With AI code generation, there's no real way to know for sure (currently).
Some leading questions (for me) come down to:
1. Are the sources used by the AI's learning phase trustworthy? (e.g. When will models be sophisticated enough to be trained to avoid some potentially problematic solutions?)
2. How would an AI-generated solution be maintained over time? (e.g. When can AI prompt + context be saved and re-used later?)
3. How is my (potentially proprietary) solution protected? (e.g. When can my company host a viable trained model in a proprietary environment?)"
I want to say that my idea is worth it because the answers to these questions are (currently) not great (IMO) for the AI-generated world.
But, the world is not static. At some point, AI code generators will be 10x or 100x more powerful. I'm confident that, at some point, these code generators will easily surpass my 20+ years of experience. And, company-hosted, trained AI models will most likely happen. And context storage and re-use will (by demand) find a solution. And trust will eventually be accomplished by "proof is in the pudding" logic.
Basically, barring laws governing AI, my project doesn't stand a cold chance in hell. I knew this would happen at some point, but I was thinking more like a 5-10 year timeframe. Now, I realize, it could be 5-10 months.
Not OP but I've been playing with similar technology as a hobby.
>1. Are the sources used by the AI's learning phase trustworthy? (e.g. When will models be sophisticated enough to be trained to avoid some potentially problematic solutions?)
Probably not, but for most domains reviewing the code should be faster than writing it.
>2. How would an AI-generated solution be maintained over time?
I would imagine you don't save the original prompts. Rather, when you want to make changes you just give the AI the current project and a list of changes to make. Copilot can do this to some extent already. You'd have to do some creative prompting to get around context size limitations, maybe giving it a skeleton of the entire project and then giving actual code only on demand.
> When can my company host a viable trained model in a proprietary environment?
Hopefully soon. Finetuned LLaMA would not be far off GPT-3.5, but nowhere close to GPT-4. And even then there are licencing concerns.
1> Relying on code reviews has concerns, IMO. For example, how many engineers actually review the code in their dependencies? (But, I guess it wouldn't take that much to develop an adversarial "code review" AI?)
2> Yes, agreed, that would work. Provided the original solution had viable tests, the 2nd (or additional) rounds would have something to keep the changes grounded. In fact, perhaps the existing tests are enough? Making the next AI version of the solution truly "agile"?
3> So, at my age (yes, getting older) I'm led to a single, tongue-in-cheek / greedy question: How to invest in these AI-trained data sets?
> Given a data model and information about the data (relationships, intents, groupings, etc.) output a fully deployable service. From unit tests through container definitions, and everything I can think of in-between (docs, OpenAPI spec, log forwarder, etc.)
AWS roughly has one of these in Amplify. The data mapping parts are pretty great, though lots of the rest of it suck. The question I'd ask is if those parts suck by nature of the setup, or is it just amplify that has weird assumptions
IMO Emacs is a perfect candidate for this kind of thing, or maybe something akin to LSP so you can bring your own editor. New GPT extensions are coming out daily for Emacs, e.g. https://github.com/xenodium/chatgpt-shell
e2b isn't like copilot. You don't really write code in our "IDE". Currently, it works in a way that you write a technical spec and then collaborate with an AI agent that builds the software for you.
It's more like having a virtual developer available for you.
Maybe until we all have a local LLM's and custom models with full control, this level of abstraction(prompting) is not useful.
I refuse to contribute to the"Open"AI scheme. Let marketing and teens to give them data.:)
I agree with your point about self hosting as long term strategy. However at current stage, it’s still a balance between capability and control. Greater control means you lag behind on bleeding edge features (gap seems to be constantly narrowing here though for LLMs thanks to tons of OSS efforts).
Literally states it won’t use your data. Ofc there’s non trivial risk that this policy will change over time. Still, I don’t feel like there’s any huge lock-in risk with OpenAI right now, so advantage of using it outweighs the risk for most.
ive tried gpt4 for simple things and ended up doing it myself. Simple regex for matching whether an url is a link to the root of the domain or is some page. Told it several times to try again and it failed.
How are you folks looking beyond something that is so simple and should be correct always to be useful? How does it save you time if any small detail may be a potential problem?
I suspect that regex is very much like mathematics for these models: they're just bad at it. If you provided the model failure cases for each failed attempt, you might get better output. I've also seen folks say that requiring reasoning makes answers more robust as well.
It's perfectly possible to assemble a project by writing individual files one at a time, so you would basically get 8k tokens per file. Or you could even write out the files in parts.
Sure but in that case how do it keep track of what it has already output? What stops it 'hallucinating' functions in files that it has already produced.
When I tried to “talk” to gpts, it was always hard to formulate my requests. Basically, I want to code. To tell something structured, rigorous, not vague and blah-blah-ey. To check and understand every step*. Idk why I feel like this, maybe for me the next ten years will be rougher than planned. Maybe because I’m not a manager type at all? Trust issues? I wonder if someone else feels the same.
Otoh I understand the appeal partially. It roots in the fact that our editors and “IDEs” suck at helping you to navigate the knowledge quickly. Some do it to a great extent, but it’s not universal. Maybe a new editor (plugin) will appear that will look at what you’re doing in code and show the contextual knowledge, relevant locations, article summaries, etc in another pane. Like a buddy who sits nearby with a beer, sees what you do and is seasoned enough to drop a comment or a story about what’s on your screen any time.
Programming in a natural language and trusting the results I just can’t stand. Hell, in practice we often can’t negotiate or align ourselves in basic things even.
* and I’m not from a slow category of developers