Hacker Newsnew | past | comments | ask | show | jobs | submit | afshinmeh's commentslogin

Genuinely wondering though: is the problem that the patch was vibe coded, or is that no one reviewed the changes?

"Vibe coding" implies the changes weren't reviewed. That's the most common definition of the term.

Even if the developer himself didn't say that, though, it's safe to assume no AI generated commit beyond a very small size is ever properly reviewed (in the sense that the entire code is actually understood) because doing so would take longer than actually writing the code by hand like a caveman.


what does "LLM-native code understanding" mean in this context?

Thank you for checking out VT Code! “LLM-native code understanding” refers to VT Code's approach of using LLM as the primary mechanism for semantic code analysis rather than relying solely on traditional static analysis tools. I have tried using ast-grep for structured code parsing understanding as a ground truth before/after the agent executes a code analysis or does a code edit/write operation and code context understanding and symbol analysis. I also tried to use tree-sitter to enhance the user's prompt parser grammar. Example: currently I use tree-sitter bash grammar to check for user input prompts for Unix commands: “run cargo fmt” -> VT Code will detect and understand right away the intent is to run a bash command -> parse and hand it to the harness -> wait for the stdout/err. Then, parse the stdio handle to the LLM as an agent loop. This is to save context and parser roundtrip.

This is just my naive implementation, so as “llm-native code understanding,” VT Code will use LLMs to perform deep code understanding across multiple programming languages as a fallback if my enhance `ast-grep` + ripgrep + tree-sitter implementation is failed, but this relies on the model's intelligent. If you follow end-of last year post-training breakthrough (GPT-5.1 and Opus 4.5 era, November 2025), I read somewhere from Anthropic and OpenAI researchers that now the models are smart enough to understanding code with more context. They even have their own internal monologue so they can reason about code grammars and code context by itself. https://github.com/vinhnx/VTCode/blob/a154162f/docs/README.m...

Note: I don't have enough understanding describing this cleanly as I learn by doing mostly. However, initially when I designed and built VT Code, I had a vision of using and for AST-enhanced grep code for replacement of std grep. I also use my grep tool, called grep. `perg`). I also wanted to parse source code into concrete syntax trees usable in compilers, interpreters, text editors, and static analyzers. Also, I thought of using LSP but still exp. All this might be overhead for a small open source coding harness, but I love to build, so I thought to myself, why not, just build and learn.


I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.


FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.
https://www.loc.gov/preservation/resources/rfs/data.html


.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.

[0]: https://7-zip.org/7z.html

[1]: CVE-2025-0411


I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.

What are the advantages or reasons to use zstd in a 7z container versus just .zst?


I love zstd as the next guy and I do use zstd solo for the most part. I had a talk on it few years back too (incl. using the lib directly from Java, massively decreasing log storage, and so on).

Why use it w/ 7-zip though. 7-zip archives multiple files/directories and supports encryption. It has the UI too.. On Windows there is NanaZip that's available in the microsoft store which has been graced by corporate for user-install (unlike zstd that effectively needs WSL), and most folks won't be able to use the command line tool.

Of course using tar with zstd is always an option if you are on linux.


That's exactly how I tried to address that problem with https://github.com/afshinm/zerobox -- you control what network access (e.g. `--deny-net *.amazonaws.com`) your agent has and you also get snapshotting out of the box.

That said, using LakeFS is probably a better long term solution and I like this approach.


Curious, what format would you prefer to use to represent a workflow instead of YAML?


Type-safe code. Workflows are not configuration! If I wanted YAML hell I could stick to Github Actions.

But that's only the start. There are a lot of other things I would expect of a new workflow orchestrator in 2026 so if you are not comparing yourself to the competition you probably don't know what you're getting yourself into.


Yeah, that makes sense. I looked at a few workflow orchestrators and I'm building something that I will release soon, but my thinking is that the "workflow engine" should be an abstraction that takes the input and executes the steps. "What" you use to define that workflow is probably the SDK layer though, but I can certainly see the value in using type safe code to define as opposed to a YAML file.

I'm mainly focusing on the portability aspect of it (e.g. use TS/Python/etc. to define the workflow/steps or just simple a simple YAML file).


Are you planning to map those varied definitions onto varied orchestrators?


Sort of. My thinking is that the input to define the workflow should be anything you prefer to use (TS, Go, YAML, etc.) and the orchestrator's job is to model that and execute the job, given your deployment model.


There are a number of widely used orchestrators, it would be nice to deploy to one of those vs a new kid on the block


I'm mainly looking at Rust based projects and haven't been able to find something to use out of the box, without hacky RPC/Shell execs. Curious if you have any suggestions?


The big data world largely revolves around python, like much of the AI world. Many of the people are more focused on the science than programming, so they aren't interested in the same arguments we often see about rise being a good choice for implementation. They want to use a language they know well to get their job done, hence something like Airflow be asked about in other comments.



I wonder though, what about cases where you have multiple agents or LLM backends and the credentials is shared between all of them?


Agreed and it's a pattern that OpenAI suggested a few days ago, too [1]. I also built a cross platform process level sandboxing that uses parts of OpenAI Codex for the same purpose [2]

[1] https://openai.com/index/the-next-evolution-of-the-agents-sd...

[2] https://github.com/afshinm/zerobox


Vibe coding aside [1], it's very interesting software projects these days don't really care about adding a single test [2].

[1]: https://github.com/withastro/flue/blob/8fdf8e0e9df5bd33c3120...

[2]: https://github.com/search?q=repo%3Awithastro%2Fflue+test+pat...


I find this impressive: in my experience, codex-rs loves to add tests even when not prompted. Of course, it’s a bit of a crap shoot as to whether the test tests useful behavior.

(My favorite so far: it created an empty file in /home/whatever and added a test to verify that some code it wrote would indeed fail when tested on this empty input and that it would fail with the correct error message. Never mind that this covered approximately none of the desired behavior and that the test would, of course, fail on any other system.


That would be really interesting. I doubt it's the case, actually probably the opposite? The harnesses seem very happy to write extensive test suits, without me having to ask much.


I find automated tests are the only way to keep vibe coded projects on the rails, especially as you grow something beyond demo phase.


Tests is the new gold. You keep them to avoid a vibe coded fork.


And what would they test? This is a meaningless wrapper for Anthropic or OpenAI SDKs.


> how do we avoid burning tokens solving the same problems over again

Letting the LLM write half baked tools is the recipe for burning more tokens.

> There's a wiki the LLM searches before solving a problem, that links saved programs for past actions to their content entry.

What's the criteria for marking an LLM written tool as useful/correct before publishing it?


  > Letting the LLM write half baked tools is the recipe for burning more tokens.
It sure is, if the tools are half-baked and your user scale is N=1 rather than N=100 or N=1,000

  > What's the criteria for marking an LLM written tool as useful/correct before publishing it?
It solves the problem the originating user asked it to


> It solves the problem the originating user asked it to

Interesting. And is there a mechanism to go back and "fix" the tools after they are published? What happens if the tool decided to use the "id" attribute to click on buttons and now you have a new website that follows a different pattern to find the right target?

I agree that "correctness" of a tool could have different meaning depending on the context of the problem though (e.g. would you consider OOM a correctness bug even if it addresses the user's ask?)


The problem here is that N different users will ask for N different variants of the same tool, so you'll end up with a tool which is similar but not quite. Is the tool updated to support new functionality, or a new tool is created and you end up with N variants of a tool.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: