I do mean revert or change resources. Making manual changes outside the context of an IaC tool is largely madness, in my opinion, although sometimes the situation warrants it.
Auditability and controls are one of the many facets of IaC. We require code changes to be approved by another developer, and similarly we require infrastructure changes to be approved by another developer. Regularly working outside the IaC tool would be in violation of that policy.
It's in this approval step that the change-set (CFN) or plan (Terraform) should be carefully reviewed by a human. If someone's made manual changes, reversion of them should appear here, and those should be unusual and eyebrow-raising. At that point, it's either fix the IaC definition of the infrastructure, manually un-drift it as you say, or do some workaround to ignore specific changes.
(To reiterate, IMO no one should ever run CFN/Terraform unattended on prod infrastructure, and there should always be a step to review the change-set/plan.)
I'll also say that the sword cuts both ways when it comes to prod outages and manual changes. Not so long ago, I ran into a prod-impacting issue when turning on multi-AZ for an RDS instance. In the other regions that had multi-AZ enabled, someone had manually added an extra parameter to the RDS parameter group, one that was required for certain app functionality to work. No one ever added it back to CloudFormation and that knowledge was eventually lost. When we enabled multi-AZ in a different region, we expected no problems at all, but instead we ended up with a whole section of app functionality breaking.
(This would've been before drift detection was a thing in CloudFormation, but actually I don't think RDS parameter groups are supported in CloudFormation's drift detection right now anyway. [0])
Auditability and controls are one of the many facets of IaC. We require code changes to be approved by another developer, and similarly we require infrastructure changes to be approved by another developer. Regularly working outside the IaC tool would be in violation of that policy.
It's in this approval step that the change-set (CFN) or plan (Terraform) should be carefully reviewed by a human. If someone's made manual changes, reversion of them should appear here, and those should be unusual and eyebrow-raising. At that point, it's either fix the IaC definition of the infrastructure, manually un-drift it as you say, or do some workaround to ignore specific changes.
(To reiterate, IMO no one should ever run CFN/Terraform unattended on prod infrastructure, and there should always be a step to review the change-set/plan.)
I'll also say that the sword cuts both ways when it comes to prod outages and manual changes. Not so long ago, I ran into a prod-impacting issue when turning on multi-AZ for an RDS instance. In the other regions that had multi-AZ enabled, someone had manually added an extra parameter to the RDS parameter group, one that was required for certain app functionality to work. No one ever added it back to CloudFormation and that knowledge was eventually lost. When we enabled multi-AZ in a different region, we expected no problems at all, but instead we ended up with a whole section of app functionality breaking.
(This would've been before drift detection was a thing in CloudFormation, but actually I don't think RDS parameter groups are supported in CloudFormation's drift detection right now anyway. [0])
[0] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...