I've observed something interesting: my toddler can quickly and somewhat effortlessly solve a simple physical puzzle, yet when I try to prompt large language models (LLMs) to guide me clearly through the solution, they struggle.
Even with clear photographs and detailed instructions, LLMs tend to give general advice or indirect methods rather than precise, actionable steps that clearly demonstrate correctness.
Have you tried something similar? Have you successfully "prompt-engineered" an LLM into giving clear, precise, step-by-step solutions for physical puzzles? If yes, what's your approach?
Kudos to the author for diving in and uncovering the real story here. The Python 3.14 tail-call interpreter is still a nice improvement (any few-percent gain in a language runtime is hard-won), just not a magic 15% free lunch. More importantly, this incident gave us valuable lessons about benchmarking rigor and the importance of testing across environments. It even helped surface a compiler bug that can now be fixed for everyone’s benefit. It’s the kind of deep-dive that makes you double-check the next big performance claim. Perhaps the most thought-provoking question is: how many other “X% faster” results out there are actually due to benchmarking artifacts or unknown regressions? And how can we better guard against these pitfalls in the future?
I guess the bigger question for me is, how was a 10% drop in Python performance not detected when that faulty compiler feature was pushed? Do we not benchmark the compilers themselves? Do the existing benchmarks on the compiler or python side not use that specific compiler?
As far as I am aware, the official CPython binaries on Linux have always been built using GCC, so you will have to build your own CPython using both Clang 18 and 19 to notice the speed difference. I think this is partly why no one has noticed the speed difference yet.
This post is a perfect reminder of Gandhi’s advice: "Be the change you want to see in the world." We can’t expect kids to trust or wait for better outcomes unless we first model consistency and patience ourselves. Our everyday actions—keeping promises, showing reliability—are the real lessons that shape their future.
Cool, to put on a bit of a tin hat, how do we know that the model is not tuned to infringe on what we in the West would consider censorship or misinformation?
I haven't deployed PipeGate on AWS; it started as a personal curiosity project. There are probably many security aspects that would need to be addressed before deploying it in a production environment.
Thanks for the detailed response—lots of good points here.
IT's main concern is blocking any external traffic from hitting local machines, even if tunneled or encrypted. Outbound traffic is generally fine, but many external platforms require direct URLs for real-time webhooks, which makes solutions like ngrok hard to replace.
Polling an external queue from the LAN is an interesting idea—I’ll check if IT might allow that. Long-lived tunnels like SSH or Cloudflare are probably a no-go given their restrictions.
So far, no comments regarding compliance or privacy.
PGQueuer is a lightweight job queue for Python, built entirely on PostgreSQL. It uses SKIP LOCKED for efficient and safe job processing, with a minimalist design that keeps things simple and performant.
If you’re already using Postgres and want a Python-native way to manage background jobs without adding extra infrastructure, PGQueuer might be worth a look: GitHub - https://github.com/janbjorge/pgqueuer
Also https://github.com/TkTech/chancy for another (early) Python option that goes the other way and aims to have bells and whistles included like a dashboard, workflows, mixed-mode workers, etc...
Check out the Similar Projects section in the docs for a whole bunch of Postgres-backed task queues. Haven't heard of pgqueuer before, another one to add!
I always wondered about the claim that SKIP LOCKED is all that efficient. Surely there are lots of cases where this is a really suboptimal pattern.
Simple example: if you have a mixture of very short jobs and longer duration jobs, then there might be hundreds or thousands of short jobs executed for each longer job. In such a case the rows in the jobs table for the longer jobs will be skipped over hundreds of times. The more long-running jobs running concurrently, the more wasted work as locked rows get skipped again and again. It wouldn't be a huge issue if load is low, but surely a case where rows get moved to a separate "running" table would be more efficient. I can think of several other scenarios where SKIP LOCKED would lead to lots of wasted work.
Good point about SKIP LOCKED inefficiencies with mixed-duration jobs. In PGQueuers benchmarks, throughput reached up to 18k jobs/sec, showing it can handle high concurrency well. For mixed workloads, strategies like batching or partitioning by job type can help.
While a separate "running" table reduces skips, it adds complexity. SKIP LOCKED strikes a good balance for simplicity and performance in many use cases.
One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.
>One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.
Generally what you need to do there is have some column that can be sorted on that you can use as a high watermark. This is often an id (PK) that you either track in a central service or periodically recalculate. I've worked at places where this was a timestamp as well. Perhaps not as clean as an id but it allowed us to schedule when the item was executed. As a queue feature this is somewhat of an antipattern but did make it clean to implement exponential backoff within the framework itself.
I think PGQueuers main advantage is simplicity; no extra infrastructure is needed, as it runs entirely on PostgreSQL. This makes it ideal for projects already using Postgres and operational familiarity. While it may lack the advanced features or scalability of dedicated systems like Kafka or RabbitMQ, it’s a great choice for lightweight without the overhead of additional services.
Im excited to announce the release of PGQueuer v0.15.0!
PGQueuer is a minimalist, high-performance job queue library for Python that leverages PostgreSQL's robustness, designed for real-time, high-throughput background job processing.
Key Features in This Release:
- Recurring Job Scheduling: You can now easily schedule recurring jobs using cron-like expressions with the new SchedulerManager. Ideal for automating routine tasks like data synchronization or cleanups.
Love it. My new backend app stack is Postgres/FastAPI/PGQueuer. No need for Redis. Perfect for backends that need a fleet of workers to generate, e.g. embeddings.
I should also add asyncpg-trek for migrations to that stack above. Took me awhile to find a simple Postgres migration tool that just used asyncpg, without ORM, and simple SQL files.
A lot of initial benefit for sure (and likely long term) from being Postgres centric.
I followed a blog post to implement a temporary queue in PG, started using it, and promptly forgot to ever comeback to it.
Nice to see development on this has continued, going to try it out now.
I’m finding it valuable to have more of the universe from the beginning in Postgres until if/when it’s outgrown.. instead of starting with nosql and spending inordinate amounts of time making the data relatable.
Even with clear photographs and detailed instructions, LLMs tend to give general advice or indirect methods rather than precise, actionable steps that clearly demonstrate correctness.
Have you tried something similar? Have you successfully "prompt-engineered" an LLM into giving clear, precise, step-by-step solutions for physical puzzles? If yes, what's your approach?