I’m guessing you’ve never worked in customer support. The failure modes of mistakes would be nasty. Even smart people swap bulbs around when diagnosing faults.
Simplicity (good usability) is most always crushingly hard to achieve, doubly so for hardware.
Calling things “simple” is often a sign of shallow thinking in my experience - something a customer or manager might naively say but an engineer cannot (because they have to deal with all of the real requirements).
For example, the engineers that build cars can’t say “you simply push a button to start a car” - as an engineer the complexity behind that simple operation is very very deep.
> For the customer-replacement case, you simply tell the customer to replace just one bulb at a time
Just imagining the customer support for this is gonna give me nightmares.
“Sir, you need to make sure your vehicle’s ignition is turned to accessory mode. Then wait for the light to blink twice, that’s the vehicle’s confirmation that it correctly identified the new light. If it blinks three times, it can’t confirm the light’s location, so you should try removing it and re-inserting it. If it blinks four times, that means you didn’t replace the bulbs in the correct order so you need to initiate a manual reset procedure by going to the driver’s seat and…”
CAN frames only have space for 8 bytes of payload, unless you upgrade to CAN-FD at a significant complexity cost. For the sake of a light bulb, you could make it work by being sufficiently clever. You could even use all 8 bytes for serial number, and then use existence of the message itself to turn on the bulb. Have it turn off after 100ms of timeout.
It's really not a sustainable approach to try to address nodes on a CAN bus by serial number, though. CAN is content addressed rather than receiver addressed. Due to the way arbitration works on the bus, it's invalid for two nodes to transmit to the same CAN identifier. The arbitration mechanism breaks down and results in error frames, at which point the CAN bus is in a degraded state.
That would preclude a CAN enabled bulb from being able to send telemetry back, at least until the bulb was provisioned an identifier. That could be done by an ECU sending a frame with the bulb's serial number and assigned identifier. You still need a zero-conf discovery protocol, though, and so you're back to transmitting before provisioning. You could work around all that, but it's a lot of work.
Stepping back a bit, running a car's CAN bus over a light bulb socket is going to cause some practical reliability problems. Compared to a wire harness going into an ECU, a user serviceable bulb socket is going to be much more prone to intermittent connections from vibration, as well as oxidation and wear. Intermittent connections on CAN_H/CAN_L tend to cause a ton of frame errors, and significantly degrade the overall bus performance often to the point of system failure. When a node encounters enough error frames, it is compelled by the standard to go into a BUS-OFF state where it isolates itself from the bus. Because it's a bus and all the nodes share the same two wires, it's pretty much impossible to diagnose where an intermittent connection is without trial and error.
I appreciate the detailed insight! Great point on something subtle re individual bulbs that is non-ideal. I'm learning CAN now, mainly for use in drones. I have got 2 STM32 FDCAN periphs talking to each other; the basics seem easy, but the protocols that go on top of it seem complicated! I suppose this is due to managing a decentralized network. Ie, at first CAN seemed like to offer a bus that simplifies wiring and offers resistance to noise, but the more subtle and interesting point seems to be a common API where hardware access is handled by individual nodes, and communication is through this API layer on top of the hardware. Ie, if you control the whole network, it can seem like the first case, but the interesting things happen, eg as you describe, arise when the nodes are by different manufacturers and are swappable.
Ie, with CAN, each node only needs to do reg reads/writes/datasheet-spelunking for a narrow part; the other nodes just need to know the API that sits on top of the hardware.
CAN only really works out well in a complex system if you have full control over the addressing scheme. Addressing and prioritization are one in the same. Unintuitively, prioritization isn't about the importance of a message so much as it is about the message's urgency. A pretty common approach is to use "rate monotonic" prioritization. The basic idea is that higher rate messages have higher priority (lower address) than lower rate messages.
There's rules of thumb about never overloading a CAN bus beyond, say, 50% utilization. That's because systems with poor prioritization management tend to start falling over around there. With a well thought out scheme, it's possible to push a CAN bus fairly close to 100% utilization. I built several safety critical systems that pushed 80% utilization on average. At that level, you really need to rely on redundancy rather than simple robustness, though. A CAN bus running at 80% falls over very hard when you have a flaky physical connection somewhere.