These exist, e.g. the Elgato Stream Deck. It's basically a keypad with x buttons (there are various versions) that each are small lcd displays that you can program to show and do anything you want (so you can make it do the 'on air' display thing you mention). Its main use case is for streamers to switch between scenes in their streaming software, but I use it for video conferencing (with OBS's virtual camera) to switch between full-screen camera view and desktop sharing, and do stuff like mute/unmute etc.
Is it possible to use it to control a Zoom session without virtualizing the audio/video input devices? Discord has a local API for that but I haven't found a way to control Zoom calls from another app.
Not sure what you mean by 'control a zoom session', but yes I use it with Zoom. I use OBS to composite video and some audio, I use the OBS virtual camera as the camera device in Zoom, for audio I usually use the straigh microphone stream because it's fiddly to set up (you have to do the mixing outside OBS because OBS doesn't have a virtual audio device).
If you mean that you just want to mute/unmute a zoom session, then also yes - you configure the stream deck to output key press events so you'd program it to output the keyboard shortcuts that you want. Not sure if Zoom has separate mute/unmute shortcuts and if you change settings with the regular keyboard/mouse you might get the display state of the stream deck out of sync with the actual state of the software, that would probably be finicky and/or a lot of work to solve.
I'm still tweaking my setup but using this piece of kit with a good quality webcam, a Blue Yeti mic on an arm, and OBS, being able to control Zoom/MS Teams/Skype in a uniform way, having ultimate control over what part the desktop I share, how I pre-process audio, being able to show my desktop with myself in the corner, ... is already so much better than the clunky default experiences of each of these video conferencing tools. It's like programming with vim - yes I spend an inordinate amount of time 20+ years ago getting proficient with it, but using it just feels like an extension of my brain, like using a Hilti drill hammer vs using a bargain bin Chinese piece of junk.
Thanks for the explanation! Sorry, I got distracted and forgot to write a reply. I don't need the full range of features offered by OBS yet, but I'm strongly considering setting it up just to have control over the video stream. I'm using the Zoom (hah) portable recorder for audio since it offers outstanding audio quality, convenient mic controls and basic signal processing. The problem with controlling apps via keystrokes is exactly what you describe: since the communication is one way, the state of the toggle buttons inevitably gets out of sync. I think maybe using the accessibility API to read the UI state back can help, but I'm not holding my breath.