The problem being solved is AI being able to distinguish unique objects within visual data. Before SAM, people would have to train a model on specific objects by labeling data and training a model to understand those objects specifically. This becomes problematic given the variety of objects in the world, settings they can be in, and their orientation in an image. SAM can identify objects it has never seen before, as in objects that might not be part of the training data.
Once you can determine which pixels belong to which object automatically, you can start to utilize that knowledge for other applications.
If you have SAM showing you all objects, you can use other models to identify what the object is, understand it's shape/size, understand depth/distance, etc. It's a foundational model to build off of for any application that wants to use visual data as an input.
yep, value is pretty clear from his demo. Goes from dozens of clicks to identify an object within an image to a single click. SAM does almost exactly what you'd want as a human in every one of his examples.