6.8 KiB
6.8 KiB
Incident Response Runbook
Classification: Confidential
Version: 1.0
Quick Reference
| Incident Type | First Step | Escalation |
|---|---|---|
| Compromised credentials | Rotate IAM keys | Security team |
| Data breach | Isolate S3 bucket | Legal + Security |
| DoS attack | Enable WAF | AWS Support |
| Malware in images | Quarantine bucket | Security team |
| KMS key compromised | Disable key, create new | AWS Support |
1. Security Alert Response
1.1 Lambda Error Alarm
Trigger: lambda-errors > 5 in 5 minutes
Steps:
- Check CloudWatch Logs:
/aws/lambda/image-processor-proc - Identify error pattern (input validation, timeout, permissions)
- If input validation failures: possible attack vector
- If permissions errors: check IAM role changes
- Document findings in incident ticket
Recovery:
- Deploy fix if code-related
- Update input validation if attack-related
- Notify users if service impacted
2. Data Breach Response
2.1 S3 Bucket Compromise
Trigger: GuardDuty finding, unusual access patterns
Immediate Actions (0-15 min):
# 1. Block all access to affected bucket
aws s3api put-bucket-policy --bucket image-processor-ACCOUNT \
--policy '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Principal":"*","Action":"s3:*","Resource":["arn:aws:s3:::image-processor-ACCOUNT/*"]}]}'
# 2. Enable S3 Object Lock (prevent deletion)
aws s3api put-object-lock-configuration --bucket image-processor-ACCOUNT \
--object-lock-configuration '{"ObjectLockEnabled":"Enabled"}'
# 3. Capture access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/
Investigation (15-60 min):
- Review S3 access logs for unauthorized IPs
- Check CloudTrail for API call anomalies
- Identify compromised credentials
- Scope data exposure (list affected objects)
Containment (1-4 hours):
- Rotate all IAM credentials
- Revoke suspicious sessions
- Enable CloudTrail log file validation
- Notify AWS Security
Recovery (4-24 hours):
- Create new bucket with hardened policy
- Restore from backup if needed
- Re-enable services incrementally
- Post-incident review
3. KMS Key Compromise
Trigger: KMS key state alarm, unauthorized KeyUsage events
Immediate Actions:
# 1. Disable the key (prevents new encryption/decryption)
aws kms disable-key --key-id <key-id>
# 2. Create new key
aws kms create-key --description "Emergency replacement key"
# 3. Update Lambda environment
aws lambda update-function-configuration \
--function-name image-processor-proc \
--environment "Variables={...,KMS_KEY_ID=<new-key-id>}"
Recovery:
- Re-encrypt all S3 objects with new key
- Update all references to old key
- Schedule old key for deletion (30-day window)
- Audit all KeyUsage CloudTrail events
4. DoS Attack Response
Trigger: Lambda throttles, CloudWatch spike
Immediate Actions:
# 1. Reduce Lambda concurrency to limit blast radius
aws lambda put-function-concurrency \
--function-name image-processor-proc \
--reserved-concurrent-executions 1
# 2. Enable S3 Requester Pays (deter attackers)
aws s3api put-bucket-request-payment \
--bucket image-processor-ACCOUNT \
--request-payment-configuration '{"Payer":"Requester"}'
Mitigation:
- Enable AWS Shield (if escalated)
- Add WAF rules for S3 (CloudFront distribution)
- Implement request rate limiting
- Block suspicious IP ranges
5. Malware Detection
Trigger: GuardDuty S3 finding, unusual file patterns
Immediate Actions:
# 1. Quarantine affected objects
aws s3 cp s3://image-processor-ACCOUNT/uploads/suspicious.jpg \
s3://image-processor-ACCOUNT/quarantine/suspicious.jpg
# 2. Remove from uploads
aws s3 rm s3://image-processor-ACCOUNT/uploads/suspicious.jpg
# 3. Tag for investigation
aws s3api put-object-tagging \
--bucket image-processor-ACCOUNT \
--key quarantine/suspicious.jpg \
--tagging 'TagSet=[{Key=Status,Value=Quarantined},{Key=Date,Value=2026-02-22}]'
Analysis:
- Download quarantine file to isolated environment
- Scan with ClamAV or VirusTotal API
- Check file metadata for origin
- Review upload source IP in access logs
6. Credential Compromise
Trigger: CloudTrail unusual API calls, GuardDuty finding
Immediate Actions:
# 1. List all access keys for affected user/role
aws iam list-access-keys --user-name <username>
# 2. Deactivate compromised keys
aws iam update-access-key --access-key-id <key-id> --status Inactive
# 3. Delete compromised keys
aws iam delete-access-key --access-key-id <key-id>
# 4. Create new keys
aws iam create-access-key --user-name <username>
Recovery:
- Audit all API calls made with compromised credentials
- Check for unauthorized resource creation
- Rotate all secrets that may have been exposed
- Enable MFA if not already enabled
7. Forensics Data Collection
7.1 Preserve Evidence
# CloudTrail logs (last 24 hours)
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetObject \
--start-time $(date -d '24 hours ago' -Iseconds) > forensics/cloudtrail.json
# CloudWatch Logs
aws logs create-export-task --log-group-name /aws/lambda/image-processor-proc \
--from $(date -d '24 hours ago' +%s)000 --to $(date +%s)000 \
--destination s3://forensics-bucket/logs/
# S3 access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/ --recursive
7.2 Chain of Custody
Document:
- Time of incident detection
- Personnel involved
- Actions taken (with timestamps)
- Evidence collected (with hashes)
- Systems affected
8. Communication Templates
8.1 Internal Notification
SECURITY INCIDENT NOTIFICATION
Incident ID: INC-YYYY-XXXX
Severity: [Critical/High/Medium/Low]
Status: [Investigating/Contained/Resolved]
Summary: [Brief description]
Impact: [Systems/data affected]
Actions Taken: [List of containment steps]
Next Update: [Time]
Contact: [Incident commander]
8.2 External Notification (if required)
SECURITY ADVISORY
Date: [Date]
Affected Service: AWS Image Processing
Description: [Factual, non-technical summary]
Customer Action: [If customers need to take action]
Status: [Investigating/Resolved]
Contact: security@company.com
9. Post-Incident
9.1 Required Documentation
- Incident timeline (minute-by-minute)
- Root cause analysis
- Impact assessment
- Remediation actions
- Lessons learned
9.2 Follow-up Actions
| Timeframe | Action |
|---|---|
| 24 hours | Initial incident report |
| 72 hours | Root cause analysis |
| 1 week | Remediation complete |
| 2 weeks | Post-incident review |
| 30 days | Security control updates |
Review Schedule: This runbook must be tested quarterly via tabletop exercise and updated after each incident.