Ideally you can only retry error codes where it is guaranteed that no backend logic has executed yet.
This prevents retry amplification.
It also has the benefit that you can retry all types of RPCs, including non-idempotent ones.
One example is if the server reports that it is overloaded and can't serve requests right now (loadshedding).
Without retry amplification you can do retries ASAP, which has much better latency. No exponential backoff required.
Retrying deadline exceeded errors seems dangerous. You are amplifying the most expensive requests, so even if you only retry 20% of all RPCs, you could still 10x server load.
Ideally you can start loadshedding before the server grinds to a halt (which we can retry without risk of amplification).
Having longer RPC deadlines helps the server process the backlog without timeouts.
That said, deadline handling is a complex topic and YMMV depending on the service in question.
Without retry amplification you can do retries ASAP, which has much better latency. No exponential backoff required.
Retrying deadline exceeded errors seems dangerous. You are amplifying the most expensive requests, so even if you only retry 20% of all RPCs, you could still 10x server load. Ideally you can start loadshedding before the server grinds to a halt (which we can retry without risk of amplification). Having longer RPC deadlines helps the server process the backlog without timeouts. That said, deadline handling is a complex topic and YMMV depending on the service in question.