I was asked by an artist friend whether or how it would be possible to build a technological solution that e.g. returns a list of people in a room and where they roughly are.
The idea is to figure out the distances of said people to certain projected inages and change them accordingly when these distances change.
Now I have some experience with laser baded Rime-of-Flight sensors, but those have too narrow of a field of view and wouldn't be able to cover the whole area (it would be a U shapes space). I had a look at OpenCV and YOLOv8 and bytetracker. That worked okay, but produced many false positives and jitters on my recorder-from-top test footage. Two people standing next to each other were classified as a single elephant, bear, horse all within the span of a single second. I had a look at training my own model, but I won't be able to film the footage at the space at which this should take place.
Is computer vision even the right option? If it is, does anybody know good models for people from top?
https://arxiv.org/search/?query=%22pose+estimation%22&search...
See also
https://github.com/pliablepixels/zmeventnotification
for a rather mature system that adds person and object detection to a security camera system.