Files
tf-aws-lambda-imageprocessing/INCIDENT_RESPONSE.md
2026-02-22 05:37:03 +00:00

277 lines
6.8 KiB
Markdown

# Incident Response Runbook
**Classification:** Confidential
**Version:** 1.0
---
## Quick Reference
| Incident Type | First Step | Escalation |
|---------------|------------|------------|
| Compromised credentials | Rotate IAM keys | Security team |
| Data breach | Isolate S3 bucket | Legal + Security |
| DoS attack | Enable WAF | AWS Support |
| Malware in images | Quarantine bucket | Security team |
| KMS key compromised | Disable key, create new | AWS Support |
---
## 1. Security Alert Response
### 1.1 Lambda Error Alarm
**Trigger:** `lambda-errors > 5 in 5 minutes`
**Steps:**
1. Check CloudWatch Logs: `/aws/lambda/image-processor-proc`
2. Identify error pattern (input validation, timeout, permissions)
3. If input validation failures: possible attack vector
4. If permissions errors: check IAM role changes
5. Document findings in incident ticket
**Recovery:**
- Deploy fix if code-related
- Update input validation if attack-related
- Notify users if service impacted
---
## 2. Data Breach Response
### 2.1 S3 Bucket Compromise
**Trigger:** GuardDuty finding, unusual access patterns
**Immediate Actions (0-15 min):**
```bash
# 1. Block all access to affected bucket
aws s3api put-bucket-policy --bucket image-processor-ACCOUNT \
--policy '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Principal":"*","Action":"s3:*","Resource":["arn:aws:s3:::image-processor-ACCOUNT/*"]}]}'
# 2. Enable S3 Object Lock (prevent deletion)
aws s3api put-object-lock-configuration --bucket image-processor-ACCOUNT \
--object-lock-configuration '{"ObjectLockEnabled":"Enabled"}'
# 3. Capture access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/
```
**Investigation (15-60 min):**
1. Review S3 access logs for unauthorized IPs
2. Check CloudTrail for API call anomalies
3. Identify compromised credentials
4. Scope data exposure (list affected objects)
**Containment (1-4 hours):**
1. Rotate all IAM credentials
2. Revoke suspicious sessions
3. Enable CloudTrail log file validation
4. Notify AWS Security
**Recovery (4-24 hours):**
1. Create new bucket with hardened policy
2. Restore from backup if needed
3. Re-enable services incrementally
4. Post-incident review
---
## 3. KMS Key Compromise
**Trigger:** KMS key state alarm, unauthorized KeyUsage events
**Immediate Actions:**
```bash
# 1. Disable the key (prevents new encryption/decryption)
aws kms disable-key --key-id <key-id>
# 2. Create new key
aws kms create-key --description "Emergency replacement key"
# 3. Update Lambda environment
aws lambda update-function-configuration \
--function-name image-processor-proc \
--environment "Variables={...,KMS_KEY_ID=<new-key-id>}"
```
**Recovery:**
1. Re-encrypt all S3 objects with new key
2. Update all references to old key
3. Schedule old key for deletion (30-day window)
4. Audit all KeyUsage CloudTrail events
---
## 4. DoS Attack Response
**Trigger:** Lambda throttles, CloudWatch spike
**Immediate Actions:**
```bash
# 1. Reduce Lambda concurrency to limit blast radius
aws lambda put-function-concurrency \
--function-name image-processor-proc \
--reserved-concurrent-executions 1
# 2. Enable S3 Requester Pays (deter attackers)
aws s3api put-bucket-request-payment \
--bucket image-processor-ACCOUNT \
--request-payment-configuration '{"Payer":"Requester"}'
```
**Mitigation:**
1. Enable AWS Shield (if escalated)
2. Add WAF rules for S3 (CloudFront distribution)
3. Implement request rate limiting
4. Block suspicious IP ranges
---
## 5. Malware Detection
**Trigger:** GuardDuty S3 finding, unusual file patterns
**Immediate Actions:**
```bash
# 1. Quarantine affected objects
aws s3 cp s3://image-processor-ACCOUNT/uploads/suspicious.jpg \
s3://image-processor-ACCOUNT/quarantine/suspicious.jpg
# 2. Remove from uploads
aws s3 rm s3://image-processor-ACCOUNT/uploads/suspicious.jpg
# 3. Tag for investigation
aws s3api put-object-tagging \
--bucket image-processor-ACCOUNT \
--key quarantine/suspicious.jpg \
--tagging 'TagSet=[{Key=Status,Value=Quarantined},{Key=Date,Value=2026-02-22}]'
```
**Analysis:**
1. Download quarantine file to isolated environment
2. Scan with ClamAV or VirusTotal API
3. Check file metadata for origin
4. Review upload source IP in access logs
---
## 6. Credential Compromise
**Trigger:** CloudTrail unusual API calls, GuardDuty finding
**Immediate Actions:**
```bash
# 1. List all access keys for affected user/role
aws iam list-access-keys --user-name <username>
# 2. Deactivate compromised keys
aws iam update-access-key --access-key-id <key-id> --status Inactive
# 3. Delete compromised keys
aws iam delete-access-key --access-key-id <key-id>
# 4. Create new keys
aws iam create-access-key --user-name <username>
```
**Recovery:**
1. Audit all API calls made with compromised credentials
2. Check for unauthorized resource creation
3. Rotate all secrets that may have been exposed
4. Enable MFA if not already enabled
---
## 7. Forensics Data Collection
### 7.1 Preserve Evidence
```bash
# CloudTrail logs (last 24 hours)
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetObject \
--start-time $(date -d '24 hours ago' -Iseconds) > forensics/cloudtrail.json
# CloudWatch Logs
aws logs create-export-task --log-group-name /aws/lambda/image-processor-proc \
--from $(date -d '24 hours ago' +%s)000 --to $(date +%s)000 \
--destination s3://forensics-bucket/logs/
# S3 access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/ --recursive
```
### 7.2 Chain of Custody
Document:
- [ ] Time of incident detection
- [ ] Personnel involved
- [ ] Actions taken (with timestamps)
- [ ] Evidence collected (with hashes)
- [ ] Systems affected
---
## 8. Communication Templates
### 8.1 Internal Notification
```
SECURITY INCIDENT NOTIFICATION
Incident ID: INC-YYYY-XXXX
Severity: [Critical/High/Medium/Low]
Status: [Investigating/Contained/Resolved]
Summary: [Brief description]
Impact: [Systems/data affected]
Actions Taken: [List of containment steps]
Next Update: [Time]
Contact: [Incident commander]
```
### 8.2 External Notification (if required)
```
SECURITY ADVISORY
Date: [Date]
Affected Service: AWS Image Processing
Description: [Factual, non-technical summary]
Customer Action: [If customers need to take action]
Status: [Investigating/Resolved]
Contact: security@company.com
```
---
## 9. Post-Incident
### 9.1 Required Documentation
1. Incident timeline (minute-by-minute)
2. Root cause analysis
3. Impact assessment
4. Remediation actions
5. Lessons learned
### 9.2 Follow-up Actions
| Timeframe | Action |
|-----------|--------|
| 24 hours | Initial incident report |
| 72 hours | Root cause analysis |
| 1 week | Remediation complete |
| 2 weeks | Post-incident review |
| 30 days | Security control updates |
**Review Schedule:** This runbook must be tested quarterly via tabletop exercise and updated after each incident.