Detection Engineering Fundamentals: From Signatures to Behaviors and Back Again

Overview

Detection engineering is the discipline of turning threat-informed hypotheses into alerts that fire on real attacks and stay quiet on everything else. A good detection catches the activity that matters; a bad detection either catches nothing or catches so much that the team ignores it. The difference between the two is the engineering: the threat-informed hypothesis that drove the detection, the data sources that support it, the rule logic that turns the data into a signal, and the testing that confirms the rule fires when it should and stays quiet when it should not.

The two broad categories of detection are signatures and behaviors. A signature is a specific indicator that an attack has happened: a specific file hash, a specific IP, a specific command-line string. A behavior is a pattern of activity that suggests an attack is in progress: an unusual process spawning a network connection, a user logging in from two countries in an hour, a service account being used interactively. Signatures are precise and brittle (the attacker changes the hash and the signature misses); behaviors are imprecise and robust (the attacker's behavior is harder to change than their tools). The right detection pipeline has both: signatures for the known-bad, behaviors for the unknown.

The interesting part of detection engineering is the second-order work: tuning the rules to suppress false positives without suppressing true positives, measuring the coverage gap between what the rules catch and what the threat model says matters, retiring rules that have outlived their usefulness, and writing detections that are easy to read six months later when the original author has moved on. The operational discipline is the part that separates a detection pipeline that works from one that looks like it works.

How it works

A detection pipeline has four components. The data sources: the logs, events, and telemetry that the detection rules run against. The detection logic: the rules, queries, or models that turn the data into a signal. The alert pipeline: the routing, deduplication, and prioritization that gets the signal to the right person. The measurement: the metrics that tell you whether the pipeline is working.

Data sources are the foundation. The most useful data sources for detection engineering are endpoint telemetry (process creation, file write, registry modification, network connection), authentication logs (logon success and failure, MFA challenge, password change), network telemetry (DNS query, HTTP request, TLS handshake, flow record), and cloud control plane (IAM role assumption, KMS key use, S3 bucket policy change, EC2 instance launch). The right data sources for a detection depend on the threat model; the wrong data sources are the ones you collect but never query, which is most of what organizations collect.

Detection logic is the rule. A detection rule has three parts: the data source (where to look), the condition (what to match), and the threshold (how much matching is enough to alert). A signature rule has a specific condition (a hash match, an IP match, a command-line string) and a low threshold (any match is suspicious). A behavior rule has a more abstract condition (an unusual process is doing X, a user is doing Y from Z) and a higher threshold (the behavior must be repeated or must be combined with other signals to be suspicious). The right threshold is the one that fires on real attacks and stays quiet on legitimate activity; finding the right threshold is the empirical part of detection engineering.

Alert pipeline is the routing. A detection that fires on the right activity but routes to the wrong person is a detection that does not get investigated. The right alert pipeline routes by severity (critical alerts to on-call, lower-severity to a queue), by domain (network detections to the network team, identity detections to the identity team), and by context (an alert that includes the relevant endpoint, user, and time of the suspicious activity is much more useful than an alert that just says "suspicious activity detected"). The right alert pipeline also deduplicates (one alert per incident, not one alert per event) and suppresses (no new alerts for an incident that is already being investigated).

Measurement is the honesty check. The right metrics for a detection pipeline are: coverage (what percentage of the threat model does the detection pipeline catch), false positive rate (how many of the alerts are not real attacks), mean time to detect (how long from the start of the attack to the alert firing), and mean time to investigate (how long from the alert firing to the investigation starting). A detection pipeline that is not measured is a detection pipeline that has been working on faith; a measurement shows where the faith is justified and where it is not.

In practice

A useful first detection for most organizations is a credential-stuffing detection: log every authentication, alert when the same source IP tries to authenticate as many users in a short window with a high failure rate. The signal is unambiguous (a user logging into multiple accounts from the same IP is either a credential-stuffing attack or an extremely unusual user behavior), the data source is the authentication log (which most organizations already collect), and the threshold is empirical (the right number of failed attempts from one IP is somewhere between 20 and 100, depending on the legitimate user behavior).

A second useful detection is impossible-travel: alert when a user authenticates from two countries in a window that is shorter than the time it would take to physically travel between them. The signal is unambiguous (a user who authenticates from New York and then from Singapore 30 minutes later is either traveling at near-light-speed or has had their credentials compromised), the data source is the authentication log with geolocation enrichment, and the threshold is empirical (a four-hour window catches most legitimate travel patterns and most credential compromise).

A third useful detection is service account interactive use: alert when a service account (an account that is supposed to be used by a service, not by a human) authenticates interactively (an RDP session, an SSH session, a console logon). The signal is unambiguous (a service account that logs in interactively is either compromised or being misused), the data source is the authentication log with account-type enrichment, and the threshold is one (any interactive logon by a service account is worth investigating).

A fourth useful detection is unusual child process: alert when a process spawns a child process that is unusual for it. The signal is that an attacker who has code execution on a host often uses the host's existing processes to do their work (running commands through cmd.exe, spawning PowerShell from a non-PowerShell parent, executing scripts through Word or Excel), and the parent-child relationship is a high-fidelity signal. The data source is endpoint telemetry with process-tree capture, and the threshold is empirical (the rule fires on combinations of parent and child that are unusual in the environment).

Common mistakes

The first mistake is collecting data you do not use. Most organizations collect orders of magnitude more log data than they query, and the cost of the collection (storage, retention, ingest licensing) is real. The right operational model is to instrument for the detections you intend to write, and to add data sources as needed when a new detection requires them. The wrong operational model is to collect everything and write detections later; the data ages out before the detections are written.

The second is rule logic that nobody reads. A detection rule that is a 200-line query with nested conditions and unexplained exceptions is a detection rule that will be silently broken the next time the schema changes. The right rule is short, named clearly, documented with the threat it addresses, and reviewed by at least one other person before it ships.

The third is no measurement. A detection pipeline without coverage, false positive rate, and time-to-detect metrics is a detection pipeline that is operating on faith. The faith may be justified, but the team cannot tell; they cannot prioritize the next detection to write, cannot justify the budget for the detection team, and cannot demonstrate the value of the work. The right operational model is to measure, even if the measurement is approximate.

The fourth is alert fatigue. A detection pipeline that fires on too much legitimate activity is a detection pipeline that gets ignored. The right tuning is empirical: ship the rule, watch the alerts, suppress the legitimate activity, retune. A rule that is too noisy to ship is worse than a rule that is not shipped, because a noisy rule trains the team to ignore alerts.

The fifth is no test data. A detection rule that has never been tested against a true positive is a detection rule that might not fire when it should. The right operational model is to have a test dataset (a sample of the kind of activity the rule is meant to catch) and to run the rule against the test dataset before it ships, and to keep the test dataset for re-testing when the rule is updated.

Defensive guidance

Start with the data sources you already have. The right first detections are usually written against the authentication log, the endpoint telemetry, and the DNS log, all of which most organizations already collect. The wrong first step is to deploy a new data source and write detections against it; the data ages out before the detections are written. Use what you have, then add data sources as needed.

Ship rules that are short, named clearly, and documented. A 200-line detection rule with nested conditions is a rule that will be silently broken the next time the schema changes. The right rule is the one a different analyst can read in two minutes and understand what it is meant to catch.

Measure coverage and false positive rate. Coverage is the percentage of the threat model the detection pipeline catches; false positive rate is the percentage of alerts that are not real attacks. Without these, the team is operating on faith, and faith is not a strategy. The measurement can be approximate; the important thing is that it exists and is reviewed.

Tune rules empirically. Ship the rule, watch the alerts, suppress the legitimate activity, retune. A rule that fires on every user logging in from a new device is a rule that trains the team to ignore alerts. The right tuning is the work after the rule ships, and it is the work that determines whether the rule is useful.

Build test data and re-test when rules change. A detection rule that has never fired against a true positive is a rule that might not fire when it should. The right operational model is a test dataset (a sample of the kind of activity the rule is meant to catch) and a re-test when the rule is updated.

Tie detections to the threat model. The right detection pipeline is one where every rule has a hypothesis about what attack it is meant to catch, and the threat model lists the attacks that matter. The wrong detection pipeline is one where rules are written based on what the data is easy to query. The first approach catches the attacks that matter; the second catches the attacks that are easy to catch.

Detection Engineering Fundamentals: From Signatures to Behaviors and Back Again

Overview

How it works

In practice

Common mistakes

Defensive guidance

Have a question about security, tech, or my articles?

Related articles

Secrets Management for Engineers: API Keys, Tokens, and the Operations That Keep Them Safe

Phishing-Resistant MFA and WebAuthn: The Practical Choices That Actually Stop Account Takeover

Understanding OAuth 2.0 Security Best Practices