Skip to main content

Exception management

Trust Architecture Playbook: Automation pillar

Every automation program has exceptions, the key is ensuring they are governed with the same rigor applied to the standard model. If not, they quietly become the standard model.

90/10 model

The 90/10 model is a design target. The expectation is that 90% of the certificate estate moves through standardized, repeatable automation patterns with the 10% that cannot be managed as a time-boxed, compensating-controlled exception with a defined retirement path. That 10% is a known risk position the organization has accepted, under specific conditions, for a specific period. When those conditions change or that period expires, the exception is retired.

Without that discipline, the first exception is legitimate, the second is similar, the third is slightly less clear-cut but gets approved because the bar for rejection is not well defined. By the time the register has 30 entries, half could have been automated with a modest engineering investment, several have passed their review date without action, and the exception process has become a workaround to avoid satisfying the readiness gates.

Keeping exceptions exceptional

The volume of exceptions is one of the most important leading indicators of program health. A decline in exceptions indicates a maturing program closing the gap between the standard model and the full estate. A stable or growing number of exceptions, particularly one where average exception age is increasing, indicates either that the pattern catalog has gaps or that the process has become a bypass mechanism.

Pay particular attention to exceptions that cluster around a specific platform class, use case, or organizational segment. One exception for a legacy appliance is a corner case. Five exceptions for the same appliance model across different business units is a pattern catalog gap.

Exceptions are not failures. If used correctly, they provide a controlled path for edge cases that the standard model cannot yet address. Conversely, if used incorrectly, they become the path of least resistance. Enforcement is the key to keeping exceptions exceptional: hold a regular review cadence, retire them aggressively, and treat a growing exception register as the program health signal it is.

Exception register

Field

Required content

Exception ID

Unique identifier and approval date.

Affected service / certificates

Service name, tier, owner, and environment.

Reason for exception

Why the standard pattern or profile doesn’t work today. Be specific — "legacy constraint" is not sufficient.

Compensating controls

Manual checks shortened alert windows, extra monitoring, or temporary scope restriction. Must be specific and testable.

Expiry / Review date

Mandatory revalidation date. No exception survives without a defined expiry.

Retirement plan

How and when the exception will be eliminated. Acceptable paths: platform upgrade, connector development, decommission.

Approval authority

Named approver or governance body. Central PKI/Security must approve all Tier 0/1 exceptions.

Exception qualification

Not every deviation from the standard pattern should qualify as an approved exception. Use the following criteria to distinguish legitimate exceptions from gaps that require remediation:

  • Legitimate exception: Legacy platform that cannot support any approved enrollment protocol; third-party SaaS dependency with no programmable certificate management; regulatory constraint that prevents automation of a specific certificate type.

  • Remediation required: Unknown owner; unknown install location; manual deployment that could be automated with engineering investment; missing monitoring. These are readiness gate failures, not exceptions.

  • Pattern catalog problem: If multiple services in the same platform class require exceptions, the issue is likely a missing pattern, not a legitimate one-off. Escalate to pattern catalog review.

Exception handling worksheet

  • Define what qualifies as an exception (legacy constraint, third-party dependency, unsupported install point, regulatory restriction).

  • Require compensating controls (monitoring, shorter alert thresholds, documented manual procedure). Compensating controls must be specific and testable — not aspirational.

  • Time-box all exceptions. No exception is permanent. Maximum initial term: 90 days with mandatory governance review before extension.

  • Remove exceptions as platforms become automatable. Platform upgrades, connector development, and decommissions are the acceptable retirement paths.

  • Track exception frequency by platform class. Persistent exceptions in a platform class signal a missing pattern — escalate to pattern catalog review.

Break-glass guardrails

  • If required, use dedicated emergency profiles with narrow scope and short validity rather than production profiles with elevated access.

  • Separate emergency approvers from day-to-day automation operators wherever possible.

  • Log: who invoked the process, why it was used, what was issued, where it was deployed, and how the service was validated.

  • Run a post-event review within a defined window (recommended: 5 business days for Tier 0/1) and convert emergency changes into governed standard patterns or corrective actions.

  • Pre-stage emergency profiles and permissions for Tier 0/1 services before an incident occurs — not during one.