Hmm... does it really work as advertised? Can it really run some basic object detection in real-time on-device, and at what resolution/framerate combination(s)?
Sorry for being skeptical, just that it sounds too incredible to believe. I've toyed around with the same chips you're using, OV2640 with ESP32-S3 and at any decent resolution it simply seemed to lack the processing power to run even simple motion detection, let alone anything fancier. Surely it was kind of fine at low-res and it kinda works spotting an elephant in a room, but it was completely incapable for detecting small fast-moving targets (roaches) under less than perfect lighting conditions (bathroom ceiling lights, decent but not overly bright). Best it could do was serving a 1024p@5fps MJPEG stream over a network to a more powerful machine for further processing.
I haven't tried but it must be doable at low-res low-framerate conditions, where CPUs still have plenty of time left between the frames and frames aren't big (so maybe it can even fully decode those JPEGs, not just extract the DC coefficients for a quick-and-dirty hacks).
It's just the advertising page that sounds kinda unbelievable: low power, night vision, image analysis on the device, perfect for wildlife monitoring, can detect pests on crops (implying high resolution unless we're talking about deer and rabbits lol), etc etc.
The video on the site really should highlight some of the capabilities. It was pretty pointless in “showing off” what it could do, which is typically what video is good for.
> It has been tested extensively with many processors based on the Arm Cortex-M Series architecture, and has been ported to other architectures including ESP32.
Very interesting! Looking forward to seeing more embedded AI devices coming out.
More NLP based, but here is an article on an effort to build Transformers micromodels to run on embedded devices. The model in this example is under 1MB. Goal would be to ultimately convert this from ONNX to TFLite.
If you want to use it for video, I'd wait until the ESP32-P4 releases.
You could do some rudimentary AI on the ESP32-S3, but you can't send video at a decent rate. The P4 will have H.264 encoding, MIPI connectors, and a 400mhz dual-core.
MJPEG isn't a great way to stream video, and the AI has to be super small to fit into SRAM. You could use flash over SPI, but that's not the greatest.
You can get some decent results, but not for anything I thought was interesting for actual use.
Nice job, though a smaller PCB would've been more practical? The ESP32-CAM is definitely a bit tricky to work with software-wise.
I'm not sure how much sense it actually makes to run anything onboard unless you want to take a snapshot every few seconds and then spending a while classifying it. I've eventually just resorted to streaming at SVGA and decent framerates and then processing the output on another machine.
Also there is https://github.com/jomjol/AI-on-the-edge-device which digitalizes analoge water, gas, power and other meters. It uses the ESP32 camera module and local inference for character recognition. I found this very useful during the gas price hikes to identify effective saving measures. The inference speed is anything but realtime but for this use case fast enough. I took measurements every 5 minutes, the project states: Values smaller than 2 minutes do not make sense, as this is the time for one detection.
The ESP32 is a pretty standard sensor chip and the camera version is also widely available. A basic version of OpenCV was ported ~3 years ago, so it's cool but not sure what's new here.
I realize hardware project are, er, harder, and I buy that this project is real, but at the same time: signup pages and fundraisers aren't supposed to be Show HNs. Please see https://news.ycombinator.com/showhn.html. So I've taken Show HN out of the title now.
It does surprisingly well on object detection for its value. The tested use cases so far are: animal detection, sophisticated version of a motion sensor, plant recognition, people counter. Most likely Edge Impulse can work even a bit better.
I signed up for updates! I have been looking for something exactly like this for home automation, to detect occupancy, pets, etc. I am very excited to pair it with HomeAssistant!
Sorry for being skeptical, just that it sounds too incredible to believe. I've toyed around with the same chips you're using, OV2640 with ESP32-S3 and at any decent resolution it simply seemed to lack the processing power to run even simple motion detection, let alone anything fancier. Surely it was kind of fine at low-res and it kinda works spotting an elephant in a room, but it was completely incapable for detecting small fast-moving targets (roaches) under less than perfect lighting conditions (bathroom ceiling lights, decent but not overly bright). Best it could do was serving a 1024p@5fps MJPEG stream over a network to a more powerful machine for further processing.