As a math guy who loves reality tv, I was also drawn to the show and wrote a blog post [0] about how to programmatically calculate the probabilities as the show progresses. It was a lot of fun optimizing it to be performant. You can `pip install ayto` to use it to follow along with the show or try out scenarios.
The linked post is a very thorough treatment of AYTO and a great read. I really like the "guess who" bit on how to maximize the value of guesses. It's a shame the participants aren't allowed to have pen and paper—it makes optimization a lot trickier! I'm impressed they do as well as they do.
Thanks! Optimization was something I'd played with in previous rounds of coding up AYTO simulations, but not in the most recent version. (See the bottom section of this notebook [0]). There's also a very thorough treatment of the problem in a blog post from 2018 by SAS (the software company) [1]. It's surprising how many people have been drawn in by the allure of AYTO!
And sometimes they just don't do better as a plot point, staying together an extra week after finding out they are not the one because of the intensity of their love (they met 4 days before)
Giving them more credit than they probably deserve but: when you're solving "by hand" like they are in the show, keeping a known non-match couple together may actually be helpful for interpreting the results of a matchup ceremony because you'll know that that couple didn't contribute to the beams.
I've been building a little toy computer and assembly language that's interpreted in python. Pretty close to the first release (and introductory blog post) and a lot of fun to build (and learn a bit more about real assembly as I go).
I gave a very short talk (now a blog post) about embeddings and we use them to bridge the gap between human notions of understanding and digital representations. It might be of interest to people who enjoyed this post: https://danturkel.com/2025/03/10/ignite-machine-understandin...
> I doubt that any recommendation system is capable of providing meaningful results in absence of the "awareness" about the actual content (be it music, books, movies or anything else) of what it's meant to recommend.
Years of experience have proven that you can get quite far with pure collaborative filtering—no user features, no content features. It's a very hard baseline to beat. A similar principle applies to language modeling: from word2vec to transformers, language models never rely on any additional information about what a token "means," only how the tokens relate to each other.
I recently had a letter published in The New Yorker in response to Andrew Marantz's (excellent) story "Among the A.I. Doomsayers." In particular, I wanted to highlight that extremist doomers and accelerationists are not the only ones concerned about AI's future. In the post, I elaborate a little on the letter and provide links to further reading.
Hey Francois, congrats to you and the team on the launch! I've generally chosen Pytorch over Tensorflow for my day to day, but now that Keras is framework agnostic I'm excited to revisit it.
One thing I'm wondering about is if it's possible (or necessary?) to use Keras in concert with Pytorch Lightning. In some ways, Lightning evolved to be "Keras for Pytorch," so what is the path forward in a world where both exist as options for Pytorch users—do they interoperate or are they competitors/alternatives to each other?
Both Keras models/layers (with the PyTorch backend) and Lightning Modules are PyTorch Modules, so they should be able to interoperate with each other in a PyTorch workflow. We have not tried this with Lightning, but we've had a good experience with custom PyTorch Modules.
More broadly, it's feasible to use Keras components with any framework built on PyTorch or JAX in the sense that it's always possible to write "adapter layers" that wrap a Keras layer and make it usable by another framework, or the other way around. We have folks doing this to use Flax components (from JAX) as Keras layers, and inversely, to use Keras layers as Flax Modules.
The biggest immediate useful difference that I see is that Annoy uses read-only index files (from the docs: "you can not add more items once the tree has been created" [0]), while Voyager allows you to call `.add_item` at any time (I just pip installed to double check and yes -- it's great).
The great thing about Annoy is that you can write the index to disk and thus do big data work on tiny workers at the edge. I've never really seen anything else do the same.
Oh yeah, Annoy is definitely mmap'ed i.e. you can use the index without loading the index file into memory. And that's very useful.
As far as I can see, Voyager requires you to load the index into memory and doesn't (yet?) do mmap. Which... would make sense since you can change the data after loading the index. So, Voyager index files are fully loaded in memory..? Do I have this right?
To this point, if you're releasing minor-version changelogs, expect them to be posted on sites like HN, and make sure you link at the top of the post to the announcement of the latest major release, for those who might have missed it! It's an easy marketing win!
Minor releases (and plenty of not minor ones) are moderated away as mostly offtopic on HN since they turn into generic discussion of the project itself which is more often than not, a dupe. The saner thing is to just not post them/flag them.
I imagine that will pick up somewhat in about 20 minutes, or rather new articles will start to hit over then next 2 hours (Apple event starts at 1pm ET)
Personally I appreciate the fix for "Reveal in Finder" hanging a system app (Finder). I also wanted to see how people think of the Properties feature, which has not been discussed yet on HN.
This looks really cool. One thing I've wondered about with, e.g., the OpenAI API is if json is really a good format for passing embeddings back and forth. I'd think that passing floats as text over the wire wastes a ton of space that could add up, and might even sacrifice some precision in. Would it be better to encode at least the vectors as binary blobs, or else use something like protobuf to more efficiently handle sending tons of floats around?
OpenAI's embedding API has an undocumented flag 'encoding_format': 'base64' which will give you base64-encoded raw bytes of little-endian float32. As it is used by the official python client, it is unlikely to go away.
I totally agree when you're talking about a bunch of embeddings at once-- that's why the document level endpoint (and the token-level embedding endpoint) can optionally return a link to a zip file containing the JSON. For a single embedding, not sure it matters that much, and the extra convenience is nice.
Edit: One other thing is that you can store the JSON in SQLite using the JSON data type and then use the nice querying constructs directly at the database level, which is nice for the token-level embeddings and document embeddings. This is built in to my project.
The linked post is a very thorough treatment of AYTO and a great read. I really like the "guess who" bit on how to maximize the value of guesses. It's a shame the participants aren't allowed to have pen and paper—it makes optimization a lot trickier! I'm impressed they do as well as they do.
[0]: https://danturkel.com/2023/01/25/math-code-are-you-the-one.h...