We've run a couple experiments and have found that our open vision language model Moondream works better than YOLOv11 in general cases. If accuracy matters most, it's worth trying our vision language model. If you need real-time results, you can train YOLO models using data from our model. We have a space for video redaction, that is just object detection, on our Hugging Face. We also have a playground online to try it out.