More

zachmu · on Sept 17, 2024

I have, and this is an absurd assertion

TheNewsIsHere · on Sept 18, 2024

Agreed.

My n-1 firm was a household name in cybersecurity. We had a _lot_ of contractors but almost no visa-based positions. I worked with two of the few who were, and they became citizens to stay with the company permanently.

algaeselect · on Sept 18, 2024

Yeah this person is just making things up, the majority are not H1Bs. Also, there's this popular idea (usually on places like reddit) that H1Bs at big tech are paid "slave labor wages" because "They can't leave". This is just not true, they get the same salary and stock ranges as everyone else.

Source: was H1B (worked at a couple of big N, but not Amazon) until I got my green card (and then citizenship).

zachmu · on Sept 18, 2024

You'll never hear more nonsense about the industry than when the topic of H1Bs comes up.

zachmu · on Sept 17, 2024

> 80%+ h1bs

This is laughable, cite a source or get out.

zachmu · on Aug 28, 2024

when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel. I think about that "just an example" guide a lot when I see bad channel code.

For me the biggest red flag is somebody using a channel as part of an exported library function signature, either as a param or a return value. Almost never the right call.

lanstin · on Aug 28, 2024

I've used that pattern to write tools to e.g. re-encrypt all whatever millions of objects in an S3 bucket, and examine 400m files for jars that are or contain the log4j vulnerable code. I had a large machine near the bucket/NFS filer in question, and wanted to use all the CPUs. It worked well for that purpose. The API is you provide callbacks for each depth of the tree, and that callback was given an array of channels and some current object to examine; your CB would figure out if that object (could be S3 path, object, version, directory, file, jar inside a jar, whatever) met the criteria for whatever action at hand, or if it generated more objects for the tree. I was able to do stuff in like 8 hours when AWS support was promising 10 days. And deleted the bad log4j jar few times a day while we tracked down the repos/code still putting it back on the NFS filer.

The library is called "go-treewalk" :) The data of course never ends back in main, it's for doing things or maybe printing out data, not doing more calcualation across the tree.

lelanthran · on Aug 29, 2024

> when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel.

Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.

Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.

The alternatives are:

1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.

2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.

Both of those seem like poor choices in Go.

[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.

fnord123 · on Aug 29, 2024

> what exactly is wrong with this approach?

Before Go had iterators, you either had callbacks or channels to decompose work.

If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.

If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.

In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.

Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.

kjksf · on Aug 29, 2024

Assuming latest Go 1.13 I would write an iterator and used goroutines internally.

The caller would do:

    for f := range asyncDirIter(dir) {
    }

Better than exposing channel.

But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?

I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.

Finally, you re-defined "traversing a tree" into "traversing a filesystem".

I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.

lelanthran · on Aug 29, 2024

> In that context using goroutines is an overkill. Harder to implement, harder to use and slower.

I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.

For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.

For even 20 files on a network filesystem, I'd make it async.

Kamq · on Aug 29, 2024

The only time I've seen it work with channels in the API is when it's something you'd realistically want to be async (say, some sort of heavy computation, network request, etc). The kind of thing that would probably already be a future/promise/etc in other languages.

And it doesn't really color the function because you can trivially make it sync again.

beautron · on Aug 29, 2024

> And it doesn't really color the function because you can trivially make it sync again.

Yes, but this goes both ways: You can trivially make the sync function async (assuming it's documented as safe for concurrent use).

So I would argue that the sync API design is simpler and more natural. Callers can easily set up their own goroutine and channels around the function call if they need or want that. But if they don't need or want that, everything is simpler and they don't even need to think about channels.

lelanthran · on Aug 29, 2024

> The kind of thing that would probably already be a future/promise/etc in other languages.

Or a coroutine (caller calls `yield(item)` for each match, and `return` when done).

Filligree · on Aug 28, 2024

At the time, that was one of the only ways to write decent-looking generic code.

zachmu · on Aug 28, 2024

You are a bad bad man

zachmu · on Aug 15, 2024

Sometimes it's a reasonable choice to pay for software, especially if you're a large company that can easily afford it. It's not like "just using postgres" in a manner similar to Cockroach's capabilities is trivial, building your own solution also has a whole set of risks.

If you're absolutely opposed to ever paying for a software solution, then sure, avoid commercial projects. I'm happy to spend my (company's) money on useful software.

vdfs · on Aug 15, 2024

Without marketing bs, what's something that can be done only with Cockroach and not postgres or other truly-OSS alternatives? I'm curios because I've been reading news about it forever but never had the chance to work with it

zachmu · on Aug 15, 2024

Think of it as a replacement for spanner with a postgres frontend. It's about global availability and replication without application-level sharding.

vvern · on Aug 15, 2024

Transactional workloads over datasets in the single digit petabytes.

zachmu · on Aug 15, 2024

The BSL doesn't make it closed source, it prevents a competitor from running their own DBaaS business using Cockroach as the backend. This has happened to various open source projects, AWS started selling their technology and ate their lunch.

BSL is a totally fair compromise for commercial open source licensing imho.

If you see BSL as the first step to an announcement like today's, that's a fair criticism. Not sure how often that happens. But BSL doesn't disqualify software from being open source.

chrisoverzero · on Aug 16, 2024

> The BSL doesn't make it closed source […]

Yes, that’s right!

> But BSL doesn't disqualify software from being open source.

No, that’s wrong: https://spdx.org/licenses/BUSL-1.1.html

> The Business Source License […] is not an Open Source license.

tsimionescu · on Aug 16, 2024

Any license that prevents others from selling your code and eating your lunch is, by definition, not an open source license.

One good way of looking at the goals of open source licenses is to force companies to compete on offering services related to the code. Whether this is a sustainable idea is a different question, but this is one of the bedrock ideas about OSS (and FLOSS as well). The other is of course that the rights of those running the software are absolute and trump any rights that the original creators have, except where the users would try to prevent other users from gaining the same rights.

jen20 · on Aug 16, 2024

The BSL is not an OSI-approved license, so it’s certainly not “open source” by the commonly used definition.

I agree it’s a reasonable license. But it’s not an open source license.

immibis · on Aug 16, 2024

The OSI is a consortium of cloud platform vendors (really - check for yourself). Of course they'll define open source in a way that excludes licenses that restrict them from turning your work into closed-source cloud platforms. The good news is that we're not beholden to their definition as they have no official status whatsoever. We don't have to believe them just because they put the words Open Source in their company name.

The BSL is clearly not open source since it requires approval from the licensor in certain applications, but the OSI also rejected the SSPL, which is just an extended AGPL that requires source code publication in even more cases, and is clearly open source because of that.

jen20 · on Aug 16, 2024

OSI, and the open source definition they produced, predate the very notion of public cloud by close to a decade. While you don’t have to accept the definition, you are out of step with the industry at large, who broadly use “open source” to refer to things which meet the OSI definition. There’s no need for a competing definition: it’s fine for software to not be open source.

As to the specifics of SSPL, I personally don’t see the rationale for accepting AGPL but not SSPL.

AntonCTO · on Aug 16, 2024

At large? As you can see, there is room for a community with a different view on that. My personal definition of an "open source license" is that, as the name implies, I can access the code, preferably without much gatekeeping (e.g., creating a free account in a private GitLab instance). And, to be honest, I prefer the BSL with an Additional Use Grant over any other license, because this is the most reliable option to ensure that the project has a future and won’t be abandoned because no one wants to invest their time for free.

immibis · on Aug 17, 2024

You are welcome to choose that, but in my opinion, it isn't open source. I think open source should means anyone can contribute or take, and contributions are shared, without undue discrimination. Nobody is forced to work on the project, but if they are then they have to give the results of their work back to the common pool they took from. You have just as much power to keep the project going as anyone else does, including the current "maintainer".

AntonCTO · on Aug 18, 2024

> but if they are then they *have to* give the results of their work back to the common pool they took from

Well, here we go. Your "open" isn't so open in the end.

immibis · on Aug 19, 2024

"open is when you have the right to make it closed"

satvikpendem · on Aug 20, 2024

You cannot redefine words because they don't fit with your personal definition. Open source has meant in accordance with the OSI, for quite a while.

sgarland · on Aug 16, 2024

I hadn’t considered this angle, stupidly. Now I have to rethink a minor belief system.

yencabulator · on Aug 16, 2024

> but the OSI also rejected the SSPL

So did Debian and Red Hat. Do you think AWS leads them both?

LtdJorge · on Aug 16, 2024

It even says it is not an open source license right in the license

lolinder · on Aug 16, 2024

> The Business Source License (this document, or the “License”) is not an Open Source license. However, the Licensed Work will eventually be made available under an Open Source License, as stated in this License.

— The Business Source License

https://mariadb.com/bsl11/

zachmu · on July 25, 2024

Seems like this spells the end of any website that vends information or answers to questions, as opposed to narratives. Narrative based writing (or images) will be fine, people will still visit and see ads. But anything matching the search term "how do I" or "when was the" is toast.

Most websites in this business are, generously, hot garbage. And it's getting worse. So I imagine AI search will be quite successful at displacing them.

The problem moving forward: how do we keep information-based websites in business so that AI can scrape them? There's a real risk of AI eating its own seed corn here. Seems only fair that AI scrapers pay for the content since they're not generating ad views (and are in fact stealing future ad views). But I have no idea how you would enforce that.

zachmu · on July 15, 2024

Say more? What are you using Dolt for that you would find this useful?

zachmu · on July 15, 2024

This is true, you can just return the method without invoking it.

I don't tend to use that convention though, because most of the time when I use this pattern, I'm returning a function that uses a parameter I pass in. You can see this in the predicate example. I prefer keeping my call sites consistent and always using functions that return a function when invoked, rather than sometimes invoking them and sometimes just passing the func reference. YMMV.

zachmu · on July 8, 2024

People always cite exclusivity deals / monopoly power when it comes to Ticketmaster's dominance, but I also recall reading post-mortems about several failed competitors that indicate the problem Ticketmaster solves (massive spikey demand with strict guarantees on the seats selected) is quite technically challenging. I know, it doesn't seem like it would be that hard to solve, you're probably already thinking how you would do it. But you can't ignore that many others have tried and failed.