Hacker Newsnew | past | comments | ask | show | jobs | submit | tetraodonpuffer's commentslogin

when it comes to Bach I am surprised more people don't mention pieces like this

https://www.youtube.com/watch?v=tsxP-YjDWlQ (arioso from the cantata 156, here for oboe)

which I think stands up just fine against pretty much any other classical piece baroque or not.

Personally I have a very big soft spot for his organ works, as I play (badly) some organ myself, and among those I don't see the trio sonatas recommended nearly often enough (here is a live recital of all of them, which is super impressive)

https://www.youtube.com/watch?v=eK9irE8LMAU

among those I probably enjoy the most the vivace of BWV 530. Other favorite pieces are the passacaglia and fugue https://www.youtube.com/watch?v=nVoFLM_BDgs the toccata adagio and fugue in C major https://www.youtube.com/watch?v=Klh9GiWMc9U (the adagio especially is super nice), but there's so many. Among organists I often come back to Helmut Walcha, and am always amazed at how he was able to learn everything just by listening, him being blind.


If you're going to give them the triosonatas, you gotta give them the good one: https://www.youtube.com/watch?v=EOTtDYTc5JY&list=PLCDB42413B...

Put on a good set of headphones and go sit in the corner.

Also obligatory: https://www.youtube.com/watch?v=Ah392lnFHxM&list=RDAh392lnFH...

The thing I appericiate most about bach is:

you can play it fast.

you can play it slow.

you can play it with an ensemble of random instruments.

you can play a single voicing all by itself.

all of it screams "musical". which, if you do play say, Tuba, or one of the larger instruments, is a godsend, as most of your lines in other pieces will bore you to death.


Nice to see the Zenph recording get some love. It's such a fascinating process they had to do. It's way better than the original Gould recordings with all his singing along.


and you can throw away the metronome https://www.youtube.com/watch?v=_1xJoVzoIQg


most cars these days have GPS and return location and so on, why can't manufacturer run these updates only at night and when the car is parked at home? There should be no reason for any OTA update to happen while the vehicle is running (or on a trip etc.), downloading the OTA update, sure, but definitely not applying it. Also there should be a documented procedure to restore the previous in case an OTA update fails.


any time there is a right turn you can still end up in this same situation, whether it's right turn on red or not, if the driver does not look to their right: there have been plenty of times I have been nearly ran over when a car turning right on green did not notice that the same direction pedestrian crossing light was green also and I was about to cross.

Same thing for cars turning right in front of me riding my bike in the bike lane, it's just par for the course, so pedestrians should ALWAYS make eye contact with the driver before crossing, and cyclists should NEVER be side-by-side with a car when approaching an intersection.


If the light stays red while the "walk" sign is active (usually the case) it's a whole lot less likely that there will actually be a pedestrian there during the turn. There's also a bit more time (while waiting for the light) to see a bike approaching. Yes, all parties still violate the law and accidents can still happen, but they become less likely.

https://www.codot.gov/safety/shift-into-safe-news/2025/march...


Thanks for the write up! In my current application I have a few different scenarios that are a bit different from yours but still require processing aggregated data in order

1. Reading from various files where each file has lines with a unique identifier I can use to process in order: I open all the files and create a min heap reading the first line of each, then process by grabbing the lowest from the min-heap repeatedly, after reading a line from a file, I read another and put it in the min-heap again (the min heap cells contain the opened file descriptor for that file)

2. Aggregating across goroutines that service data generators with different latencies and throughputs. I have a goroutine each that interfaces with them and consider them “producers”. Using a global atomic integer I can quickly assign a unique increasing index to the messages coming in, these can be serviced with a min-heap same as above. There are some considerations about dropping too old messages, so an alternative approach for some cases is to index the min-heap on received time and process only up to time.Now()-some buffering time to allow more time for things to settle before dropping things (trading total latency for this).

3. Similar to the above I have another scenario where throughput ingestion is more important and repeated processing happens in-order but there is no requirement on all messages to have been processed every time, just that they are processed in order (this is the backing for a log viewer). In this case I just slab allocate and dump what I receive without ordering concerns but I also keep a btree with the indexes that I iterate over when it’s time to process. I originally had this buffering like (2) to guarantee mostly ordered insertions in the slabs themselves (which I simply iterated on) but if a stall happened in a goroutine then shifting over the items in the slab when the old items came in became very expensive and could spiral badly.


Wow, that’s some seriously sophisticated stuff - it’s not that often you see a heap used in typical production code (outside of libraries)!

Your first example definitely gives me merge-sort vibes - a really clean way to keep things ordered across multiple sources. The second and third scenarios are a bit beyond what I’ve tackled so far, but super interesting to read about.

This also reminded me of a WIP PR I drafted for rill (probably too niche, so I’m not sure I’ll ever merge it). It implements a channel buffer that behaves like a heap - basically a fixed-size priority queue where re-prioritization only happens for items that pile up due to backpressure. Maybe some of that code could be useful for your future use cases: https://github.com/destel/rill/pull/50


Hah not sure about “production”, I am currently in between jobs and am taking advantage of that to work on a docker/k8s/file TUI log viewer.

I am using those techniques respectively for loading backups (I store each container log in a separate file inside a big zip file, which allows concurrent reading without unpacking) and for servicing the various log producing goroutines (which use the docker/k8s apis as well as fsnotify for files) since I allow creating “views” of containers that consequently need to aggregate in order. The TUI itself, using tview, runs in a separate goroutine at configurable fps reading from these buffers.

I have things mostly working, the latest significant refactoring was introducing the btree based reading after noticing the “fix the order” stalls were too bad, and I am planning to do a show hn when I’m finished. It has been a lot of fun going back to solo-dev greenfield stuff after many years of architecture focused work.

I definitely love golang but despite being careful and having access to great tools like rr and dlv in goland, it can get difficult sometimes to debug deadlocks sometimes especially when mixing channels and locks. I have found this library quite useful to chase down deadlocks in some scenarios https://github.com/sasha-s/go-deadlock


I think what would help is that any F2P game was mandated to never cost more than $x/year to be listed in the app store, and possibly have different tiers that a game could decide to be in ($10/$100/$1000) based on the maximum yearly spend. The game also should prominently display the total spend per year, and lifetime, every time it is launched.

Although I do not like F2P for all the dark patterns (which have infiltrated non-F2P as well unfortunately) if it was capped to a reasonable maximum amount a year, with no player to player trading at all, and no multiple accounts for the same store account, it might could be made to not be as predatory while still keeping it financially sustainable for the companies that produce the games.


why don't you think this would work? Technically this is basically "the (SP) site trusts another (IDP) site to sign/encrypt a JWT containing some custom assertions". The user would go to the SP, get a signed blob (session nonce / expiry / whatever), take that to the IDP, log in there, IDP creates a JWT with the original blob plus any assertion you allow, you post the JWT back to the SP, SP decrypts the IDP packet, gets its own nonce, ties you to the session, done.

There are also obviously better ways (https://blog.cloudflare.com/privacy-pass-standard/ possibly some variation of zero knowledge proofs) but technically this seems like a solvable problem. Money wise the IDP or in general verifier can charge users for an account and/or generated assertions.


The problem is not technical, it's human :)


according to https://en.wikipedia.org/wiki/Ispell ispell (1971) already used Levenshtein Distance (although from the article it is not stated if this already existed in the original version, or if it was added in later years).


Levenshtein distance up to 1, according to that article. If you have a hierarchical structure (trie or a DAG; in some sense, a DAG is a trie, but stored more efficiently, with the disadvantage that adding or removing words is hard) with valid words, it is not hard to check what words satisfy that. If you only do the inexact search after looking for the exact word and finding it missing I think it also won’t be too slow when given ‘normal’ text to spell-check.


I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

As an aside, I am not sure why for LLM models the technology to spread among multiple cards is quite mature, while for image models, despite also using GGUFs, this has not been the case. Maybe as image models become bigger there will be more of a push to implement it.


40GB is small IMO: you can run it on a mid-tier Macbook Pro... or the smallest M3 Ultra Mac Studio! You don't need Nvidia if you're doing at-home inference, Nvidia only becomes economical at very high throughput: i.e. dedicated inference companies. Apple Silicon is much more cost effective for single-user for the small-to-medium-sized models. The M3 Ultra is ~roughly on par with a 4090 in terms of memory bandwidth, so it won't be much slower, although it won't match a 5090.

Also for a 20B model, you only really need 20GB of VRAM: FP8 is near-identical to FP16, it's only below FP8 that you start to see dramatic drop-offs in quality. So literally any Mac Studio available for purchase will do, and even a fairly low-end Macbook Pro would work as well. And a 5090 should be able to handle it with room to spare as well.


Memory bandwidth is only relevant for comparing LLM performance. For image generation, the limiting factor is compute, and Apple sucks with it.


If you want to wait 20 minutes for one image you can certainly run it on a macbook pro.


The quality doesn't have to get much higher for that to be a great deal. For humans the wait time is typically measured in days.


Tell me you have no experience with generative ai image models nor with human artists.


What experience do you want to point too? I've never seen an artist streaming where they can draw something equivalent to a good piece of AI artwork in 20 minutes. Their advantage right now comes from a higher overall cap on quality of the work. Minute for minute, AIs are much better. It is just that it is pointless giving a typical AI more than a a little time on a GPU because current models can't consistently improve their own work.


"a good piece of AI artwork"

You really don't understand art. At all.


If you need a hug, I suspect unfortunately I am on the wrong continent. Try thinking some positive thoughts.


Does M3 Ultra or later have hardware FP8 support on the CPU cores?


Ah, you're right: it doesn't have dedicated FP8 cores, so you'd get significantly worse performance (a quick Google search implies 5x worse). Although you could still run the model, just slowly.

Any M3 Ultra Mac Studio, or midrange-or-better Macbook Pro, would handle FP16 with no issues though. A 5090 would handle FP8 like a champ and a 4090 could probably squeeze it in as well, although it'd be tight.


All of this only really applies to LLMs though. LLMs are memory bound (due to higher param counts, KV caching, and causal attention) whereas diffusion models are compute bound (because of full self attention that can't be cached). So even if the memory bandwidth of an M3 ultra is close to an Nvidia card, the generation will be much faster on a dedicated GPU.


If 40GB you can lightly quantize and fit it on a 5090.


Which very few people have, comparatively.

Training it will also be out of reach for most. I’m sure I’ll be able to handle it on my own 5090 at some point but it’ll be slow going.


> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

40 GB of VRAM? So two GPU with 24 GB each? That's pretty reasonable compared to the kind of machine to run the latest Qwen coder (which btw are close to SOTA: they do also beat proprietary models on several benchmarks).


A 3090 + 2xTitanXP? technically i have 48, but i don't think you can "split it" over multiple cards. At least with Flux, it would OOM the Titans and allocate the full 3090


You can’t split image models over 2 GPUs like you can LLMs.


They also released an inference server for their models. Wan and qwen-image can be split without problems. https://github.com/modelscope/DiffSynth-Engine


Unless I missed something just from skimming their tutorial it looks like they can do parallelism to speed things up with some models, not actually split the model (apart from the usual chunk offloading techniques).


having the keyboard the way it is also allows you to more easily orient yourself, you can feel with the sides of your fingers if you are next to E/F or B/C and with the corner of your eye it's also straightforward to figure it out. I don't think it'd be possible (or anyways even more difficult than it is now) to play large jumps accurately if the whole keyboard looked the same


I think both of those concerns were addressed by the Dvorak of piano keyboards: https://en.wikipedia.org/wiki/Jank%C3%B3_keyboard

Has the symmetry of GP while large jumps are accomplished by shifting up a row or two.

I assume it didn’t take off for the same reason Dvorak didn’t.


There are multiple alterative layout that some advocate for. they generally do sonething else for orintation. putting a bump on middle c and other places.


Make a dimple on every C key and paint it red.


That's all fine and dandy, except early apple keyboards have the dimples one key over :p


this is why if I was a sprinter a the TDF I'd 100% be on a mechanical groupset (assuming the sponsors allowed me to) a missed shift on a sprint means losing 100%


You can use an electronic groupset as well. There are benefits to it. Shimano says it's not possible, but you can just connect the shift levers with a wire to the rear derailleurs [1]. They will not respond to wireless signals afterwards. I tried to do the replay attack and it failed. That way you get the best of both worlds.

[1] https://bettershifting.com/installation-guide/connect-12-spe...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: