DSDIGITAL SENTRY
Back to Blog
CybersecurityMay 17, 202410 min read

Secrets Management for Engineers: API Keys, Tokens, and the Operations That Keep Them Safe

Secrets are the credentials that grant access to systems: API keys, database passwords, OAuth tokens, signing keys, encryption keys. The patterns that keep secrets out of code, the patterns that rotate them safely, and the operational practices that catch the secrets that slipped through anyway.

Overview

Secrets are the credentials that grant access to systems: API keys, database passwords, OAuth tokens, signing keys, encryption keys, certificates. A leaked secret is one of the most common breach vectors, and the operations to keep secrets safe are the operations that prevent most of those breaches. The pattern is not complicated: secrets do not go in code, secrets do not go in environment files that get committed, secrets do not get logged, and secrets rotate on a schedule that is shorter than the time an attacker needs to find and use them.

The reason secrets management is hard is that the easy patterns are wrong. Putting a database password in a `.env` file and committing the file is the easiest way to configure a service, and it is also how most secret leaks happen. The right pattern takes more work: a secret manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, Google Secret Manager) that the service authenticates to at startup, retrieves the secret from at runtime, and never writes to disk or logs. The cost is real; the benefit is that the secret never lives anywhere an attacker can read it without specifically targeting the secret manager.

What makes secrets management a discipline rather than a one-time setup is the operational work after the secrets are in the manager: rotation (replacing the secret on a schedule), monitoring (alerting on unexpected reads), and revocation (the ability to invalidate a secret that has been leaked or compromised). A secret manager that holds secrets but does not rotate them is a vault that is harder to break into but still has the same old secrets. The right operational model is rotation on a schedule that is shorter than the time-to-detection of a leak.

How it works

A secret manager is a service that holds secrets and serves them to authenticated callers. The service authenticates to the secret manager (with an instance role, a workload identity, or a service account credential), requests a specific secret by name, and receives the secret value (or, in the more advanced pattern, a short-lived credential derived from the secret without the caller ever seeing the value). The secret value never leaves the secret manager in a way the caller has to handle it; the caller uses it to authenticate to the next system and discards it.

The major secret managers (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, Google Secret Manager, 1Password Secrets Automation for teams that want something simpler) all implement this pattern. The differences are in the integration (how the service authenticates to the manager, how the secret is requested, how rotation is configured) and in the secondary features (audit logs, dynamic secrets, transit encryption). For most organizations, the right choice is the secret manager that is closest to the existing infrastructure (AWS organizations use AWS Secrets Manager, GCP organizations use Google Secret Manager) and the secret manager is integrated into the existing IAM story.

Dynamic secrets are an advanced pattern that the major secret managers support. Instead of storing a long-lived database password and serving it to the service, the secret manager creates a new database user on demand, returns a short-lived password (or a token) for that user, and revokes the user when the lease expires. The service never sees a long-lived database credential; the credential it gets is valid for hours, not years. The operational benefit is that a leaked dynamic secret expires before the leak can be exploited.

The workload identity pattern is the modern way to authenticate the service to the secret manager. Instead of a service account credential (a long-lived API key that the service uses to authenticate), the service has an identity tied to its runtime environment (an AWS instance role, a GCP service account, a Kubernetes service account). The runtime environment proves its identity to the secret manager (with a signed token, a TLS certificate, or a similar mechanism), and the secret manager returns the requested secret. The service does not hold any credential that could be leaked.

In practice

A real secrets-management rollout has four phases. Phase 1 is inventory: find the secrets that are already in code, in environment files, in CI configuration, in documentation. Phase 2 is migration: move the secrets to the secret manager. Phase 3 is rotation: configure the secret manager to rotate the secrets on a schedule. Phase 4 is monitoring: instrument the secret manager to detect unusual reads.

Phase 1, inventory, is the longest phase. The places secrets hide in a typical organization: in `.env` files (often committed, sometimes ignored), in CI configuration (GitHub Actions secrets, GitLab CI variables, CircleCI contexts), in infrastructure configuration (Terraform variables, Helm values, Kubernetes secrets), in code (sometimes hardcoded for testing, sometimes accidentally committed), in documentation (README files, runbooks, Slack pinned messages). The right tool for inventory is a secret scanner (gitleaks, trufflehog, detect-secrets) that runs against the entire repository history and against every commit going forward.

Phase 2, migration, is the work of moving the secrets to the secret manager. For each secret found in Phase 1, the work is: create the secret in the manager, update the service to read the secret from the manager instead of from the file, deploy the updated service, verify the service is working, then revoke the secret from the file. The right operational model is to do the migration in small batches (one service at a time, or one secret type at a time) so that any mistake is contained.

Phase 3, rotation, is the operational work that prevents a leaked secret from being useful. For each secret in the manager, the secret manager is configured to rotate the secret on a schedule (every 30 days for database passwords, every 90 days for API keys, every 24 hours for dynamic secrets). The rotation is automated (the secret manager rotates the secret on the upstream system and stores the new value), and the service picks up the new value at its next read. A service that does not read the secret dynamically has to be restarted to pick up the new value; the right operational model is to design the service to read dynamically so that the rotation is transparent.

Phase 4, monitoring, is the safety net. The secret manager logs every read, every write, every rotation, every revocation. The logs are queried for unusual patterns: a read from an unexpected service, a read from an unexpected IP, a read at an unusual time, a large number of reads in a short window. The monitoring catches the secrets that slipped through Phase 1 and the operations that bypassed the secret manager.

Common mistakes

The first mistake is putting secrets in `.env` files and committing the `.env` file (or having the `.env` file in the deployment artifact). The `.env` file is the easiest way to configure a service, and it is also the most common way secrets get leaked. The fix is to put the secrets in the secret manager and have the service read them at startup, with no `.env` file in the deployment.

The second is putting secrets in CI configuration without restricting access. CI secrets are accessible to anyone who can run the CI workflow, which is often a wider group than the people who should have access to the secret. The fix is to use the secret manager from the CI runner (with a workload identity, not a long-lived credential) and to keep the long-lived credentials out of CI entirely.

The third is logging secrets. A stack trace that includes the API key, an error message that includes the database connection string, a debug log that includes the request body with the Authorization header. Any of these is a secret leak that the secret manager cannot prevent. The fix is to audit the logging code for places where secrets can be written, and to redact or filter them.

The fourth is not rotating. A secret that has been in the manager for three years, that has been read by hundreds of services, that has been in the logs of dozens of monitoring tools, that has been in the inboxes of dozens of engineers, is a secret that has likely been leaked somewhere along the way. The fix is rotation on a schedule that is shorter than the time-to-detection of a leak.

The fifth is treating the secret manager as a vault that holds secrets, not a system that rotates them. The right secret manager does both; the wrong secret manager (or the right secret manager used wrong) holds secrets forever and serves them to anyone who authenticates. The right operational model is rotation as a default, not as an option.

Defensive guidance

Run a secret scanner against the repository history and against every commit going forward. The scanner (gitleaks, trufflehog, detect-secrets) catches secrets that are already in code and secrets that are about to be added. The scanner should run on every pull request and should fail the build if it finds a secret. The scanner is the safety net for the operations that bypassed the secret manager.

Use workload identity for the service-to-secret-manager authentication. The service has a runtime identity (AWS instance role, GCP service account, Kubernetes service account) that it uses to authenticate to the secret manager. The service does not hold any long-lived credential that could be leaked.

Configure rotation on a schedule that is shorter than the time-to-detection of a leak. For database passwords, 30 days is a reasonable starting point; for API keys, 90 days; for dynamic secrets, the lease is the rotation (a few hours). The rotation should be automated (the secret manager rotates the secret on the upstream system) so that the rotation does not depend on someone remembering to do it.

Audit logging for places where secrets can be written. The audit is a code review for places where the API key is in a log statement, the database connection string is in an error message, the Authorization header is in a debug print. The audit catches the leaks that the secret manager cannot prevent.

Treat the secret manager as a system to be monitored, not a vault to be trusted. Log every read, every write, every rotation, every revocation. Alert on unusual patterns: a read from an unexpected service, a read from an unexpected IP, a read at an unusual time, a large number of reads in a short window. The monitoring is the operational safety net that catches the secrets that slipped through.

Rotate immediately when a leak is suspected. If a secret is in a public GitHub repository, in a Slack message, in an email that went to the wrong recipient, in a backup that was not encrypted, the secret is leaked and the rotation has to happen now. The rotation policy that is 'we will rotate on schedule' is the wrong policy when a specific secret is known to be leaked; the right policy is 'we rotate now, and the schedule rotation continues afterward'.

Have a question about security, tech, or my articles?

Ask Hermes, my AI assistant.

Chat with Hermes

Related articles