Basically, you generate a textual representation of your UI, and then compare it against pre-recorded reference.
https://youtu.be/Sn057QrCUm8?t=470
Ubikam is a semantic camera that records two channels: a markup presentation channel, and a semantic "YAML Jazz" channel that describes the semantic content and characters' state of mind.
Basically, you generate a textual representation of your UI, and then compare it against pre-recorded reference.