I have used Gemini for reading and solving electronic schematics exercises, and it's results were good enough for me. Roughly 50% of the exercises managed to solve correctly, 50% wrong. Simple R circuits.
One time it messed up the opposite polarity of two voltage sources in series, and instead of subtracting their voltages, it added them together, I pointed out the mistake and Gemini insisted that the voltage sources are not in opposite polarity.
Schematics in general are not AIs strongest point. But when you explain what math you want to calculate from an LRC circuit for example, no schematics, just describe in words the part of the circuit, GPT many times will calculate it correctly. It still makes mistakes here and there, always verify the calculation.
I think most people treat them like humans not computers, and I think that is actually a much more correct way to treat them. Not saying they are like humans, but certainly a lot more like humans than whatever you seem to be expecting in your posts.
Humans make errors all the time. That doesn't mean having colleagues is useless, does it?
An AI is a colleague that can code very very fast and has a very wide knowledge base and versatility. You may still know better than it in many cases and feel more experienced that in. Just like you might with your colleagues.
And it needs the same kind of support that humans need. Complex problem? Need to plan ahead first. Tricky logic? Need unit tests. Research grade problem? Need to discuss through the solution with someone else before jumping to code and get some feedback and iterate for 100 messages before we're ready to code. And so on.
There is also Mercury LLM, which computes the answer directly as a 2D text representation. I don't know if you are familiar with Mercury LLM, but you read correctly, 2D text output.
Mercury LLM might work better getting input as an ASCII diagram, or generating an output as an ASCII diagram, not sure if both input and output work 2D.
Plumbing/electrical/electronic schematics are pretty important for AIs to understand and assist us, but for the moment the success rate is pretty low. 50% success rate for simple problems is very low, 80-90% success rate for medium difficulty problems is where they start being really useful.
It's not really the quality of the diagramming that I am concerned with, it is the complete lack of understanding of electronics parts and their usual function. The diagramming is atrocious but I could live with it if the circuit were at least borderline correct. Extrapolating from this: if we use the electronics schematic as a proxy for the kind of world model these systems have then that world model has upside down lanterns and anti-gravity as commonplace elements. Three legged dogs mate with zebras and produce viable offspring and short circuiting transistors brings about entirely new physics.
it's hard for me to tell if the solution is correct or wrong because I've got next to no formal theoretical education in electronics and only the most basic 'pay attention to polarity of electrolytic capacitors' practical knowledge, but given how these things work you might get much better results when asking it to generate a spice netlist first (or instead).
I wouldn't trust it with 2d ascii art diagrams, there isn't enough focus on these in the training data is my guess - a typical jagged frontier experience.
I remember when I was around 11-12 years old, my father got me a computer given to him for free. It had only a console and a black screen, and I figured out by myself how to open, edit files, lock them, navigate the file system, run programs like a calculator and more, with no manual, no internet, and I didn't even know english good enough.
1-2 years later, the teacher at school showed us how to program the turtle program, and I got it stuck in an infinite loop in 10 minutes. The teacher started swearing "Your chair is smashing the ethernet cable. The program is good and you fucked it up."
Around that time, I remember going to a barber shop for a haircut and stealing his nude/porn magazines. Even younger, I used to sneak up to my uncle's bedroom where he hid alcohol, and drunk half a bottle of whisky in an hour, and getting knocked out every time.
I used to get involved in fights all the time, since 8 years old, and my favorite activity at that age, was to climb to roofs of abandoned houses at night, and wander around inside of them.
My parents regularly tried to talk some sense into me, and I was beaten up by my father for all the stuff I did.
When I was sixteen, I managed to steal a car by myself, I drove it around for 1-2 hours and I didn't know how to drive, I figured it out at that moment. After that I returned the car where it was at the start, I didn't do anything with it, but when driving it I managed to flat the tire somehow.
When I was at the university, at some point around 20 years old, I downloaded Kevin Mitnick's book from torrents, I read it, and I got inspired to phone to my university, pretend I am a professor and I want pass a student (me) for 2 courses. I passed the courses without even taking the exam.
It was around that time, a friend of mine, while he was playing the guitar at his house, he looked at me at the eyes and said dead serious: "Man, if you go on like this, you will end up in jail." It was actually earth shattering! First time someone talk some sense into me. I thought, this cannot continue, he is right.
Dialogue you mean, conversation-debate, not dialog the screen displayed element, for interfacing with the user.
The group screaming the louder is considered to be correct, it is pretty bad.
There needs to an identity system, in which people are filtered out when the conversation devolves into ad-hominem attacks, and only debaters with the right balance of knowledge and no hidden agenda's join the conversation.
Reddit for example is a good implementation of something like this, but the arbiter cannot have that much power over their words, or their identities, getting them banned for example.
> Even here it's the same, it's comments with replies but it's not truly a discussion.
For technology/science/computer subjects HN is very good, but for other subjects not so good, as it is the case with every other forum.
But a solution will be found eventually. I think what is missing is an identity system to hop around different ways of debating and not be tied to a specific website or service. Solving this problem is not easy, so there has to be a lot of experimentation before an adequate solution is established.
I don't think so. Serious attempts for producing data specifically for training have not being achieved yet. High quality data I mean, produced by anarcho-capitalists, not corporations like Scale AI using workers, governed by laws of a nation etc etc.
Don't underestimate the determination of 1 million young people to produce within 24 hours perfect data, to train a model to vacuum clean their house, if they don't have to do it themselves ever again, and maybe earn some little money on the side by creating the data.
Also, commercial software is consistently behind from open source.
I only use open source LLMs for writing (Qwen 32b from Groq) and open source editor of course, Emacs.
If some people can write better using commercial LLMs (and commercial editors), by all means, but they put themselves at a disadvantage.
Next step for me, is to use something open source for translation, I use Claude for the moment, and open source for programming, I use GPT curently. In less than a year I will find a satisfying solution to both of these problems. I haven't looked deep enough.
The point of DeepSeek-OCR is not an one way image recognition and textual description of the pixels, but text compression to a generated image, and text restoration from that generated image. This video explains it well [1].
> From the paper: Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10×), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20×, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting
mechanisms in LLMs.
It's main purpose is: a compression algorithm from text to image, throw away the text because it costs too many tokens, keep the image in the context window instead of the text, generate some more text, when text accumulates even more compress the new text to image and so on.
The argument is, pictures store a lot more information than words, "A picture is worth a thousand words" after all. Chinese characters are pictograms, it doesn't seem that strange to think that, but I don't buy it.
I am doing some experiments of removing text as an input for LLMs and replacing with it's summary, and I have reduced the context window by 7 times already. I am still figuring what it the best way to achieve that, but 10 times is not far off. My experiments involve novel writing not general stuff, but still it works very well just replacing text with it's summary.
If an image is worth so many words, why not use it for programming after all? There we go, visual programming again!
Combinators are math though. There is a section in the paper that covers the topic of graphs and charts, transforming them to text and then back again to image. They claim 97% precision.
> within a 10× compression ratio, the model’s decoding precision can reach approximately 97%, which is a very promising result. In the future, it may be possible to achieve nearly 10× lossless contexts compression through text-to-image approaches.
Graphs and charts should be represented as math, i.e. text, that's what they are anyway, even when they are represented as images, it is much more economical to be represented as math.
The function f(x)=x can be represented by an image of (10pixels x 10pixels) dimensions, (100pixels x 100pixels) or (infinite pixels x infinite pixels).
> It should be a "right to not have product forced on you."
Even better, a "right to modify everything you own, in any way you like". Don't you like the micro-controller installed by the manufacturer? Buy another one, with the correct firmware programmed from scratch, and swap it off.
We are already well into a new era of software, in which software can be programmed by itself, especially Rust. What is missing is money transactions for software companies and their employees located everywhere in the world.
"Devices with no surprises". Retail shops in conjuction with electronics engineers put new controllers in everything and re-sell it. Open source software, auditable by anyone and modified at will.
Programs for every car, every refrigerator etc cannot be programmed by a company located in one place, not even 10 places. It has to be a truly global company.
In other words, I want your device, I don't want your closed source software.
Are you willing to indemnify the manufacturer from any liability for anything that might go wrong on the car from then on? No factory warranty once you make changes. Potentially losing access to recall repairs because of the changes you made. In this age of software the entire car is increasingly designed holistically. The engineer might decide to use a particular grade of aluminum on a control arm knowing that the controller software is designed to never exceed certain limits.
> Are you willing to indemnify the manufacturer from any liability [..] No factory warranty once you make changes.
Car manufacturers have figured out how to make expensive cars with good materials and very safe as well. The problem is cheap cars, which can be much more defective and dangerous to drive.
There is a solution to that though. 10-50 people combining their buying power, getting an expensive car and time sharing their usage of it. A mix between public transportation, robo-taxi and personal ownership.
> The engineer might decide to use a particular grade of aluminum on a control arm [..]
That's a problem indeed, a 3d printer for example might be off by some millimeters in some dimension, the manufacturer accounts for that in software and it prints well afterwards. What kind of materials are used is important for sure, but the properties of metals used in the car can be made public, especially if the manufacturer is paid premium and just sold an expensive car instead of a cheap one.
The thing with software though, is that it can be infinitely extended and modified. I can have ten thousand programs more running in my computer tomorrow, with no change to anything physical. Physical stuff need to be manufactured, transported, warehoused, so there is always a limit.
Consumers want always more stuff, if 10 programs are available they want 10 programs. If 100 programs are available they want 100 programs. It never ends. Proprietary software is not ideal there.
The era of intelligent robots is the end of washing machines and many other specialized machines, even tractors.
Two kinds of machinery are needed.
One very basic and cheap robot to go around the neighborhood and gather clothes in some boxes, and transfer them in a designated room to wash them.
Two state of the art robotic hands mounted on the wall, and connected to AC (no batteries). The two arms are going to be controlled by computers even a whole rack of them, with many GPUs in them. The whole setup might use 10KW of energy, it will wash clothes by hand, it will be fast, dexterous and accurate. Expensive as well. In 3 minutes it will wash 100 t-shirts much better than the best human on the planet, or any other non-intelligent machine.
Then the small basic robot returns the clothes to the house.
Yeah, why have a washing machine when you can just let your droid hand wash the dishes? At high enough temperatures and proper scrub that's likely going to be better and take less time.
Not droid, just arms. If it needs to be fast, lift weights (just the arms or even more), have high quality cameras and be connected to a lot of compute, it needs AC from the grid.
AC wires, better not move around, especially when there is water. It has to be mounted on the wall.
Instead of saying: "Not cloudy at all today", say "Clear sky today, some scattered clouds though".
In general, always speak in a positive straightforward way, even when you want to confuse someone.
reply