I had this same issue early on when trying to adopt Obsidian. I was overwhelmed by all the "systems" and I was worried I was creating a headache for myself later on. Now I just focus on dumping text in, using search, and linking only as needed. Basically don't overdo it.
That makes sense for code or technical text, but it is less relevant for car UIs. In an infotainment system you almost never see ambiguous strings where O vs 0 or I vs l matters. Everything is highly contextual, short, and glance-based. These fonts are tuned for distance, motion, glare, and quick recognition, not for reading arbitrary identifiers. If it tested poorly in real driving conditions that would be a real problem, but judging it by programmer font rules feels like the wrong yardstick.
I've had good results from the James Hoffman recipe [0], although I brew inverted. You can push the plunger down with just the weight of resting your arm on the plunger. For something very different, you can brew something not-quite-espresso using the Fellow Prismo cap for the Aeropress.
Personal opinion is that the whole point of aeropress is that you don't need to follow any recipes to get a good result. The parameters are extremely flexible to the point of being close to foolproof. Start with good beans and water. Grind anywhere between French press and very fine pourover level. Brew anytime between 1 minute and 8 minutes. Add anywhere between 100ml and 200ml of water. Press reasonably slowly.
The results will always be good. Maybe not the level you'd get with extremely high quality light roasted beans and a very careful pourover technique, but maybean aeropress isn't the best brewer for those beans in the first place.
We have been fine-tuning models using Axolotl and Unsloth, with a slight preference for Axolotl. Check out the docs [0] and fine-tune or quantize your first model. There is a lot to be learned in this space, but it's exciting.
When do you think fine tuning is worth it over prompt engineering a base model?
I imagine with the finetunes you have to worry about self-hosting, model utilization, and then also retraining the model as new base models come out. I'm curious under what circumstances you've found that the benefits outweigh the downsides.
For self-hosting, there are a few companies that offer per-token pricing for LoRA finetunes (LoRAs are basically efficient-to-train, efficient-to-host finetunes) of certain base models:
- (shameless plug) My company, Synthetic, supports LoRAs for Llama 3.1 8b and 70b: https://synthetic.new All you need to do is give us the Hugging Face repo and we take care of the rest. If you want other people to try your model, we charge usage to them rather than to you. (We can also host full finetunes of anything vLLM supports, although we charge by GPU-minute for full finetunes rather than the cheaper per-token pricing for supported base model LoRAs.)
- Together.ai supports a slightly wider number of base models than we do, with a bit more config required, and any usage is charged to you.
- Fireworks does the same as Together, although they quantize the models more heavily (FP4 for the higher-end models). However, they support Llama 4, which is pretty nice although fairly resource-intensive to train.
If you have reasonably good data for your task, and your task is relatively "narrow" (i.e. find a specific kind of bug, rather than general-purpose coding; extract a specific kind of data from legal documents rather than general-purpose reasoning about social and legal matters; etc), finetunes of even a very small model like an 8b will typically outperform — by a pretty wide margin — even very large SOTA models while being a lot cheaper to run. For example, if you find yourself hand-coding heuristics to fix some problem you're seeing with an LLM's responses, it's probably more robust to just train a small model finetune on the data and have the finetuned model fix the issues rather than writing hardcoded heuristics. On the other hand, no amount of finetuning will make an 8b model a better general-purpose coding agent than Claude 4 Sonnet.
Most inference companies (Synthetic included) host in a mix of the U.S. and EU — I don't know of any that promise EU-only hosting, though. Even Mistral doesn't promise EU-only AFAIK, despite being a French company. I think at that point you're probably looking at on-prem hosting, or buying a maxed-out Mac Studio and running the big models quantized to Q4 (although even that couldn't run Kimi: you might be able to get it working over ethernet with two Mac Studios, but the tokens/sec will be pretty rough).
This is what I use, and it works well. It's very straightforward to add apps and automatically update them as new releases are pushed to Github or wherever they are hosted.
This idea feels a little like bullet journaling or logseq [0] to me. For what it's worth, I do this in Obsidian and clean-up my thoughts on a regular basis. It hits the right balance of minimalism and usefulness for me.
Is this quote real? I'm familiar with George Pólya's, "If you cannot solve the proposed problem, try to solve first a simpler related problem" but I cannot find any source for the Lenstra quote.
Textract is more expensive than this (for your first 1M pages per month at least) and significantly more than something like Gemini Flash. I agree it works pretty well though - definitely better than any of the open source pure OCR solutions I've tried.
yeah that's a fun challenge — what we've seen work well is a system that forces the LLM to generate citations for all extracted data, map that back to the original OCR content, and then generate bounding boxes that way. Tons of edge cases for sure that we've built a suite of heuristics for over time, but overall works really well.
I'm working on a projet that uses PaddleOCR to get bounding boxes. It's far from perfect, but it's open source and good enough for our requirements. And it can mostly handle a 150 MB single-page PDF (don't ask) without completely keeling over.
reply