herecomethefuzz's comments

herecomethefuzz · 2024-12-21T19:35:28 1734809728

> most myth pushing content

Care to elaborate?

th0ma5 · 2024-12-21T20:18:51 1734812331

Lots of lists of the myths of LLMs out there https://masterofcode.com/blog/llms-myths-vs-reality-what-you... Every single post glosses over some aspect of these myths or posits they can be controlled or mitigated in some way, with no examples of anyone else finding applicability of the solutions to real world problems in a supportable and reliable way. When pushed, a myth in the neighborhood of those in the list above is pushed like the system will get better, or some classical computing mechanism will make up the difference, or that the problems aren't so bad, the solution is good enough in some ambiguous way, or that people or existing systems are just as bad when they are not.

simonw · 2024-12-21T20:42:39 1734813759

I've written extensively about myths and misconceptions about LLMs, much of which overlaps with the observations in that post.

Here's my series about misconceptions: https://simonwillison.net/series/llm-misconceptions/

It doesn't seem to me that you're familiar with my work - you seem to be mixing me in with the vast ocean of uncritical LLM boosting content that's out there.

th0ma5 · 2024-12-22T01:09:08 1734829748

I'm thinking of the system you built to watch videos and parse JSON and the claims of that having a general suitability, which is simply dishonest imo. You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns and the above series are a kind of potemkin set of things that don't intersect with your other work.

simonw · 2024-12-22T23:37:02 1734910622

You mean this? https://simonwillison.net/2024/Oct/17/video-scraping/

To my surprise, on re-reading that post I didn't mention that you need to double-check everything it does. I guess I forgot to mention that at the time because I thought it was so obvious - anyone who's paying attention to LLMS should already know that you can't trust them to reliably extract this kind of information.

I've mentioned that a lot in my other writing. I frequently tell people that the tricky thing about working with LLMs is learning how to make use of a technology that is inherently unreliable.

Update: added a new note about reliability here: https://simonwillison.net/2024/Oct/17/video-scraping/#a-note...

Second update: I just noticed that I DID say "You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right." in that post already.

> You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns

Where did you do that?

kordlessagain · 2024-12-22T16:55:50 1734886550

> dishonest Potemkin

It's like criticizing a "Hello World" program for not having proper error handling and security protocols. While those are important for production systems, they're not the point of a demonstration or learning example.

Your response seems to take these examples and hold them to the standard of mission-critical systems, which is a form of technical gatekeeping - raising the bar unnecessarily high for what counts as a "valid" technical demonstration.

herecomethefuzz · 2024-12-21T19:28:29 1734809309

If you do any software engineering all, you would know that a 1k LoC reduction to achieve the same functionality at the same/better performance is non-trivial.

wahnfrieden · 2024-12-21T22:05:40 1734818740

They’re questioning whether it was a valuable use of time, not whether a spreadsheet of PRs was time-consuming which is apparent

herecomethefuzz · on April 2, 2024

Cool cheatsheet. As someone new to Rust, what are the benefits versus Go, C++, and C?

Sharlin · on April 5, 2024

A sibling comment already mentioned the type system as a whole, but I wish to highlight one specific feature: Rust has algebraic data types.

The term sounds academical, but I honestly can't see a modern, programmer-friendly language not having proper discriminated union types in 2024. Go's lack of sum types is not simplicity. It's a glaring omission, forcing programmers to rely on "idioms" like using tuples to return errors. Having only product types (structs) is literally like trying to do arithmetic with only multiplication, without addition.

Galanwe · on April 5, 2024

> I honestly can't see a modern, programmer-friendly language not having proper discriminated union types in 2024

Does that really need to be part of the language though, or as long as you can code it, or have it in the standard library, it's fine?

What can Rust's unions do that std::variant cannot?

Sharlin · on April 5, 2024

`std::variant` is very awkward to use, and has design compromises because it doesn't have language support. Besides, the ability to write something like `std::variant` as a pure library type first requires you to have a very complex type system, more complex than Rust's, and certainly more complex than a "simple" language such as C or Go would ever consider adopting.

C++ is going to have pattern matching "any year now", and it's going to make `std::variant` more ergonomic to use, but on the other hand any pattern matching feature will have to make design compromises in order to support `std::variant` and other mutually-incompatible variant-like library types of which C++ has a bunch of (pointers, unions, `std::optional`, `std::any`, `std::expected`, did I miss any?) Add to that all the zillions of third-party variant-like types (including boost::variant) that exist in the wild because the code base predates C++17 and/or doesn't want to use C++17 features for whatever reason. All this complexity could've been avoided had the language just supported real sum types from the start. Or at least from C++11 up or whatever.

As a sibling commenter noted, algebraic data types and pattern matching with compile-time exhaustiveness checking go hand in hand; I meant to mention the latter in my original comment, but left it out because I consider the latter almost implied by the former.

Galanwe · on April 5, 2024

Right, I'm not trying to argue that std::variant is better than native language support. By definition, a language construct will always be easier to write and read.

All I'm saying is that the current state of std::variant makes it okay enough to use type safe discriminated unions.

Visit + overload is not that far away from pattern matching in terms of readability, clang does warn on non exhaustive switch cases, etc.

Sharlin · on April 5, 2024

You did ask more generally:

> Does that really need to be part of the language though, or as long as you can code it, or have it in the standard library, it's fine?

And my answer is, yes. I don't consider `std::variant` a proper replacement, more like a crutch that may even be worse than not having anything at all, because its existence can be used as an argument against introducing language-level sum types in the future.

Expurple · on April 5, 2024

Rust's language-level support for pattern matching (including exhaustiveness checking) is very nice. Most languages with sum types have this feature. It's hard to imagine one without the other. I haven't used `std::variant` much, but I remember finding it unergonomic. Real-world C++ code uses `std::variant` way less often than Rust/functional code uses sum types. Probably, for that reason

UPDATE: also, some advantages coming from the interaction of enums with other Rust language features:

- Because Rust doesn't have a stable ABI, the compiler is free to agressively optimize the layout of enums. The size of an enum is often equal to (rather than greater than) the size of the largest variant, if the variant has some "impossible" bit values (like null pointers and non-UTF8 chars) that can be used for storing the tag.

- Because Rust traits can (and must) be implemented outside of the type definition, you can implement a trait directly for an enum and then use that enum as a polymorhpic "trait object". In C++, if you want to polymorphically use an `std::variant` as a subclass of something, you need to define an ackward wrapper class that inherits from the parent.

Galanwe · on April 5, 2024

I agree, the lack of pattern matching is a bummer. You should retry std::variant with the lambda overload trick though. It's kind of okay in terms of syntax.

    using MyDiscreminatedUnion = std::variant<MyType1, MyType2, MyType3>;

    MyDiscreminatedUnion var = MyType2{};

    std::visit(overload {
        [](MyType1 t1) {...},
        [](MyType2 t2) {...},
        [](auto t) {...},
   }, var);

coffeeaddict1 · on April 5, 2024

You can even do better with something like:

namespace StlHelpers {

template<class... Ts> struct overload : Ts... { using Ts::operator()...; };

template<class... Ts> overload(Ts...) -> overload<Ts...>;

template<class var_t, class... Func> auto visit(var_t & variant, Func &&... funcs)

{

    return std::visit(overload{ funcs... }, variant);

}

And then

StlHelpers::visit(var,

    [](MyType1 t1) {...},

    [](MyType2 t2) {...},

    [](auto t) {...}

);

foldr · on April 2, 2024

Sibling doesn't really cover the benefits vs. Go, so here's my attempt at a list.

* Rust offers memory safety without a GC. You may or may not want a GC. If you don't, then Rust is the better option.

* More broadly, Rust has a C++-like focus on providing zero cost abstractions. Go is generally happy to accept a small runtime cost for abstraction.

* Rust can generate small WASM targets because it doesn't need to bundle a runtime. Go can target WASM too, but you either need to accept much larger object sizes or use TinyGo, which doesn't implement all Go language features (although it's pretty close now that generics support has arrived).

* Rust code generally runs faster (although a lot depends on whether you're writing the kind of code where a GC is a net positive or a net negative for performance).

* Rust has a fancier type system that's much more able to express invariants. If your happy place is a place where the type system proves that your code is correct, then Rust will make you much happier than Go. The Go type system (and the culture around Go more generally) tends not to favor elaborate abstractions built on types.

* Rust has fully-featured macros, if that's your bag.

I think there are also some disadvantages of Rust compared to Go, but I've only attempted to list the advantages here.

Expurple · on April 5, 2024

I'd like to elaborate on the "fancier type system". It's not all academic. There are obvious practical advantages:

- No more bugs where a value that shouldn't be mutated is accidentally mutated in another place.

- No more bugs with writing to a closed channel or file.

- No more bugs with forgetting to close a file.

- No more null pointer errors at runtime.

- No more runtime reflection errors.

Compared to Go, safe Rust provides all these benefits and more.

ninkendo · on April 6, 2024

> No more null pointer errors at runtime

This is the most glaring thing IMO. Go repeating the billion dollar null pointer mistake is inexcusable IMO. There’s zero reasons for a language designed in the last 20 years to have this problem. This alone is enough for me to want to stay away from Go.

kbolino · on April 5, 2024

Good summary. This also illustrates why the Go compiler is much faster than the Rust compiler: it does a lot less work, pushing problems down to run time.

oconnor663 · on April 2, 2024

Rust can:

- Do the things C and C++ can do.

- Without the memory corruption issues those languages are infamous for.

- With the conveniences you'd expect of any post-internet language. (A library ecosystem that's unified around a standard build system and package manager, an async IO story, UTF-8 strings, etc.)

sham1 · on April 3, 2024

Annoyingly said library ecosystem with the standard build system and package manager becomes a pain to deal with when you're trying to do something like add packages made in the language to distros, requiring a bunch of hacks to do things like just have Cargo not try to reach online to get all the dependencies. Also stuff like the way feature flags are used cause a combinatorial explosion of packages just so you can have every single variant packaged, because it's the only way to be sure that software can be reliably compiled.

I feel that this could have been provided even without having Cargo and crates repeat the mistakes of both Maven and NPM.

At least the async IO is nice enough even if it does rely on a bunch of sometimes uncontrollable heap allocation. I'd prefer CSP personally, but it could be worse. Although with that you also couldn't avoid allocations.

josephg · on April 5, 2024

> becomes a pain to deal with when you're trying to do something like add packages made in the language to distros, requiring a bunch of hacks

The package managers in distros are pretty awful for a language like rust though. They are designed for dynamically linked C code, not a language like rust where small, developer published libraries are the norm and there’s no dynamic linking. Distro package managers also don’t support rust’s feature flags well (C programs with compile time config often has the same problem).

Apt, rpm and friends’ biggest problem is they’re awful for developers. If I write a program or library for people to use, now I’m expected to test and keep up-to-date packages (or at a minimum build instructions) for like, 6 different operating systems. “On Debian, apt install packages X and Y. Z is also needed but it’s out of date so install that from source. On Ubuntu it’s nearly the same but library Z is usable in apt. On redhat everything is available but named differently. And gentoo. And arch. And nixos. And FreeBSD pkg. Also here’s the configure script. And CMake, visual studio project files, Xcode project files, homebrew and a windows installer too.

What version of rust is even available on Debian and redhat? Is it 2 months old or 2 years old? Do my rust project’s dependencies work on that version of rustc? Are they available in apt? Urgh just kill me.

Cargo means I can just ship my project in the form I use while developing. Users get all the latest packages, chosen by me, no matter their OS. And I know their build environment is sane. Hate on cargo if you want, but cargo, npm and friends are the only sane way to ship cross platform code.

newpavlov · on April 5, 2024

In this particular case, I think it's distros who make life hard for themselves by trying to force a square peg into a round hole. Cargo supports vendoring quite well, so, in my opinion, distros should simply vendor all dependencies of a Rust application into its package together with Cargo.lock file.

It may cause some amount of duplication across all packages, but the final amount is arguably will be quite small when measured in MB. Also new release of an upstream crate may cause several updates of downstream packages even if downstream apps did not release new versions, but distros are not known for quick updates either way, so it should not be a big issue.

richardwhiuk · on April 5, 2024

I don't really know what distro package managers are offering here.

oconnor663 · on April 3, 2024

How do the distro packages work? Are they trying to provide dependencies as pre-built binaries? I didn't know Cargo could consume binaries like that.

sham1 · on April 4, 2024

I can't speak for other distros, but at least in Fedora what happens is that library code is distributed in various devel packages, where the base package for, say, "futures-io" contains the actual code. So that's the "rust-futures-io-devel.noarch". After this, you get various "subpackages" for each feature. These are mostly there so that you can declare in a package that you need certain features, these packages are fully virtual, it seems, even though they all claim ownership of the relevant Cargo.toml in the local registry.

So to be fair, I was incorrect about it being a combinatorial explosion, since I was under the impression that each combination of features would be a package, but this makes a lot more sense. It's still a quite foreign way of packaging software, though. Although I'm glad that at least Cargo can be operated offline and from official repos.

josephg · on April 5, 2024

I installed a few rust binaries the other day, like wasm-tools and the typst compiler. I installed them from cargo. Each program probably had 30-50 dependencies which were downloaded and compiled from cargo.

Do fedora and apt try to mirror all of the packages from crates.io? Are they kept up to date? Is this a manual process, where a human picks and chooses some packages and hopes nothing is missing, or is it a live mirror? If it’s done by hand, what are the chances all the dependencies for some given project will even be available?

kbolino · on April 5, 2024

Rust does not have offsetof (the real deal, not some pointer-based hack in a third-party crate) and using FFI to call C or C++ practically requires bindgen which is not stable, not part of the standard library, and tricky to configure in a portable way. Rust also doesn't have a stable ABI yet though slow progress is being made.

If you can write pure Rust in a single library/binary these aren't major issues but as a drop-in replacement for C/C++ in many of the areas where those languages are heavily used today, the edges can be surprisingly sharp.

steveklabnik · on April 5, 2024

Offsetof was just stabilized btw https://doc.rust-lang.org/stable/core/mem/macro.offset_of.ht...

eddd-ddde · on April 5, 2024

For those kind of applications zig can do a more close feel to C while removing some of the pains such as error handling, matching, null checks, slices, etc.

CJefferson · on April 5, 2024

I don’t know Go, compared to C and C++:

* safe by default — for example all array accesses are checked by default. You can do unchecked access for speed when you need, and you can check for safety in C and C++ if you want, but I feel nowadays we really should be using “safe by default”.

* much easier to parallelize. I had basically given up on multithreading in C++ and believed it was almost impossible to do well. In Rust, in my experience, if parallel code compiles it works correctly. This is because the borrow checker stops you ever mutating anything in two threads at once at compile time.

herecomethefuzz · on April 2, 2024

The author is maybe too harsh but Milvus should be there for sure.

herecomethefuzz · on April 2, 2024

Couldn't you use split normalization to improve other networks too?

Also didn't realize Milvus had a pure Python version. Is there even a reason to use Chroma now?

herecomethefuzz · on April 1, 2024

> AT&T is in contact with all those impacted and has reset passcodes for 7.6 million current customers. It also said it will offer credit monitoring wherever applicable.

Where applicable? Wouldn't it be easier to offer it across the board?

herecomethefuzz · on March 29, 2024

"Productivity" is such a misleading word here...

herecomethefuzz · on March 18, 2024

"Platform company" means multi-chip in this case?

Seems logical since it's becoming impractical to cram so many transistors on a single die.

0xcde4c3db · on March 18, 2024

I don't really understand the bird's-eye view of the product line, but judging by some of the raw physical numbers and configurations Jensen was bragging about, it means that they want to basically play the mainframe game of locking high-end applications into proprietary middleware running on proprietary chassis with proprietary cluster interconnect (hello, Mellanox acquisiton).

wtallis · on March 18, 2024

The lock-in is more of a bonus for them. The underlying problem is that it's impossible to build a chip big enough, or even a collection of chiplets big enough. Training LLMs requires more silicon than can fit on one PCB, so they need an interconnect that is as fast as possible. With interconnect bandwidth as a critical bottleneck, they're not going to wait around for the industry to standardize on a suitable interconnect when they can build what they need to be ready to ship alongside the chips they need to connect.

l33tman · on March 19, 2024

Cerebras: -Hold my beer

anon291 · on March 19, 2024

In this case the interconnects are also doing compute.

dweekly · on March 18, 2024

It means all the main chips required for a large-scale datacenter. And many of the layers of software on top of it.

Hardware: * The GPU * The GPU-GPU Fabric (NVLINK) * The CPU * The NIC * The Network Fabric (infiniband) * The Switch

And that's not even starting to get into the many layers of the software stack (CUDA, Riva, Megatron, Omniverse) that they're contributing and working to get folks to build on.

1oooqooq · on March 18, 2024

no it means rent seeking.

imagine aws if they also sold all computers in the world, now you can only rent from them

throwaway11460 · on March 18, 2024

So like IBM at the beginning of computers

maximus-decimus · on March 19, 2024

"For only 100$ a month, you'll be able to turn on the gpu you already paid for"

--Nvidia, pretty soon

bpye · on March 19, 2024

This is sort of already a reality. Their vGPU functionality (partitioning a single physical GPU into multiple virtual GPUs) is already separately licensed - https://www.nvidia.com/en-us/data-center/buy-grid/

And that's once you've bought an expensive Tesla/Quadro GPU too.

1oooqooq · on March 20, 2024

good. that only impact disgusting cloud providers who are hoarding all hardware everywhere.

no sympathy for them.