Hacker Newsnew | past | comments | ask | show | jobs | submit | zachmu's commentslogin

Figures are per ambulance ride.



hmm, perhaps this is the underpinnings of why I stopped using dolt (trying to be too clever makes things harder in the long run)


What's the incentive for people to contribute to an open source project?


Regardless of whether this particular project goes anywhere, it's at least very interesting that Yegge has discovered a way to make multi-agent setups work better. Giving them discrete personas ("you are a senior database engineer with 30 years of experience") and narrower scopes makes them much more effective. This was surprising to me but makes a lot of sense in retrospect.


The part that always struck me as weird about this stuff is that all of these "agents" with their "personas" are the same baseline LLMs with the same training ultimately, just told to basically pretend they're different. How far can that really get you?

I'm not actually a database engineer with 30 years of experience. If somebody demanded that I pretend to be one, I guess I'd give it a shot, but I would expect any actual employer would be able to tell that I don't have the level of knowledge and experience that you'd expect from somebody like that.

If the base LLM actually has the knowledge of all of these specialties, why can't it just apply them all at once, instead of needing to be told to I guess pretend to be only one of them.


Agreed, would really like to understand what this (setting the LLM up to assume a role to improve performance) is doing under the cover and why it works.

Why aren't the labs training models to pick a mantra appropriate to the task and do this themselves? "Huh, a database question. I am going to pretend I'm a database expert with lots of experience. OK, here we go!"


It is DOLT, you were right.


lol it's DOLT, not DoIt.

Yegge's Medium uses a serif font so you can tell, but in many faces you can't.

(We still get this comment constantly and it's very unfortunate)


maybe you should consider a name change then


We build DoltDB, which is a version-controlled SQL database. Recently we've been working with customers doing exactly this, giving an AI agent access to their database. You give the agent its own branch / clone of the prod DB to work on, then merge their changes back to main after review if everything looks good. This requires running Dolt / Doltgres as your database server instead of MySQL / Postgres, of course. But it's free and open source, give it a shot.

https://github.com/dolthub/dolt


I have, and this is an absurd assertion


Agreed.

My n-1 firm was a household name in cybersecurity. We had a _lot_ of contractors but almost no visa-based positions. I worked with two of the few who were, and they became citizens to stay with the company permanently.


Yeah this person is just making things up, the majority are not H1Bs. Also, there's this popular idea (usually on places like reddit) that H1Bs at big tech are paid "slave labor wages" because "They can't leave". This is just not true, they get the same salary and stock ranges as everyone else.

Source: was H1B (worked at a couple of big N, but not Amazon) until I got my green card (and then citizenship).


You'll never hear more nonsense about the industry than when the topic of H1Bs comes up.


> 80%+ h1bs

This is laughable, cite a source or get out.


when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel. I think about that "just an example" guide a lot when I see bad channel code.

For me the biggest red flag is somebody using a channel as part of an exported library function signature, either as a param or a return value. Almost never the right call.


I've used that pattern to write tools to e.g. re-encrypt all whatever millions of objects in an S3 bucket, and examine 400m files for jars that are or contain the log4j vulnerable code. I had a large machine near the bucket/NFS filer in question, and wanted to use all the CPUs. It worked well for that purpose. The API is you provide callbacks for each depth of the tree, and that callback was given an array of channels and some current object to examine; your CB would figure out if that object (could be S3 path, object, version, directory, file, jar inside a jar, whatever) met the criteria for whatever action at hand, or if it generated more objects for the tree. I was able to do stuff in like 8 hours when AWS support was promising 10 days. And deleted the bad log4j jar few times a day while we tracked down the repos/code still putting it back on the NFS filer.

The library is called "go-treewalk" :) The data of course never ends back in main, it's for doing things or maybe printing out data, not doing more calcualation across the tree.


> when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel.

Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.

Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.

The alternatives are:

1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.

2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.

Both of those seem like poor choices in Go.

[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.


> what exactly is wrong with this approach?

Before Go had iterators, you either had callbacks or channels to decompose work.

If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.

If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.

In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.

Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.


Assuming latest Go 1.13 I would write an iterator and used goroutines internally.

The caller would do:

    for f := range asyncDirIter(dir) {
    }
Better than exposing channel.

But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?

I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.

Finally, you re-defined "traversing a tree" into "traversing a filesystem".

I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.


> In that context using goroutines is an overkill. Harder to implement, harder to use and slower.

I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.

For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.

For even 20 files on a network filesystem, I'd make it async.


The only time I've seen it work with channels in the API is when it's something you'd realistically want to be async (say, some sort of heavy computation, network request, etc). The kind of thing that would probably already be a future/promise/etc in other languages.

And it doesn't really color the function because you can trivially make it sync again.


> And it doesn't really color the function because you can trivially make it sync again.

Yes, but this goes both ways: You can trivially make the sync function async (assuming it's documented as safe for concurrent use).

So I would argue that the sync API design is simpler and more natural. Callers can easily set up their own goroutine and channels around the function call if they need or want that. But if they don't need or want that, everything is simpler and they don't even need to think about channels.


> The kind of thing that would probably already be a future/promise/etc in other languages.

Or a coroutine (caller calls `yield(item)` for each match, and `return` when done).


At the time, that was one of the only ways to write decent-looking generic code.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: