tf-aws-lambda-imageprocessing/INCIDENT_RESPONSE.md

# Incident Response Runbook

**Classification:** Confidential
**Version:** 1.0

---

## Quick Reference

| Incident Type | First Step | Escalation |
|---------------|------------|------------|
| Compromised credentials | Rotate IAM keys | Security team |
| Data breach | Isolate S3 bucket | Legal + Security |
| DoS attack | Enable WAF | AWS Support |
| Malware in images | Quarantine bucket | Security team |
| KMS key compromised | Disable key, create new | AWS Support |

---

## 1. Security Alert Response

### 1.1 Lambda Error Alarm

**Trigger:** `lambda-errors > 5 in 5 minutes`

**Steps:**
1. Check CloudWatch Logs: `/aws/lambda/image-processor-proc`
2. Identify error pattern (input validation, timeout, permissions)
3. If input validation failures: possible attack vector
4. If permissions errors: check IAM role changes
5. Document findings in incident ticket

**Recovery:**
- Deploy fix if code-related
- Update input validation if attack-related
- Notify users if service impacted

---

## 2. Data Breach Response

### 2.1 S3 Bucket Compromise

**Trigger:** GuardDuty finding, unusual access patterns

**Immediate Actions (0-15 min):**
```bash
# 1. Block all access to affected bucket
aws s3api put-bucket-policy --bucket image-processor-ACCOUNT \
  --policy '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Principal":"*","Action":"s3:*","Resource":["arn:aws:s3:::image-processor-ACCOUNT/*"]}]}'

# 2. Enable S3 Object Lock (prevent deletion)
aws s3api put-object-lock-configuration --bucket image-processor-ACCOUNT \
  --object-lock-configuration '{"ObjectLockEnabled":"Enabled"}'

# 3. Capture access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/
```

**Investigation (15-60 min):**
1. Review S3 access logs for unauthorized IPs
2. Check CloudTrail for API call anomalies
3. Identify compromised credentials
4. Scope data exposure (list affected objects)

**Containment (1-4 hours):**
1. Rotate all IAM credentials
2. Revoke suspicious sessions
3. Enable CloudTrail log file validation
4. Notify AWS Security

**Recovery (4-24 hours):**
1. Create new bucket with hardened policy
2. Restore from backup if needed
3. Re-enable services incrementally
4. Post-incident review

---

## 3. KMS Key Compromise

**Trigger:** KMS key state alarm, unauthorized KeyUsage events

**Immediate Actions:**
```bash
# 1. Disable the key (prevents new encryption/decryption)
aws kms disable-key --key-id <key-id>

# 2. Create new key
aws kms create-key --description "Emergency replacement key"

# 3. Update Lambda environment
aws lambda update-function-configuration \
  --function-name image-processor-proc \
  --environment "Variables={...,KMS_KEY_ID=<new-key-id>}"
```

**Recovery:**
1. Re-encrypt all S3 objects with new key
2. Update all references to old key
3. Schedule old key for deletion (30-day window)
4. Audit all KeyUsage CloudTrail events

---

## 4. DoS Attack Response

**Trigger:** Lambda throttles, CloudWatch spike

**Immediate Actions:**
```bash
# 1. Reduce Lambda concurrency to limit blast radius
aws lambda put-function-concurrency \
  --function-name image-processor-proc \
  --reserved-concurrent-executions 1

# 2. Enable S3 Requester Pays (deter attackers)
aws s3api put-bucket-request-payment \
  --bucket image-processor-ACCOUNT \
  --request-payment-configuration '{"Payer":"Requester"}'
```

**Mitigation:**
1. Enable AWS Shield (if escalated)
2. Add WAF rules for S3 (CloudFront distribution)
3. Implement request rate limiting
4. Block suspicious IP ranges

---

## 5. Malware Detection

**Trigger:** GuardDuty S3 finding, unusual file patterns

**Immediate Actions:**
```bash
# 1. Quarantine affected objects
aws s3 cp s3://image-processor-ACCOUNT/uploads/suspicious.jpg \
  s3://image-processor-ACCOUNT/quarantine/suspicious.jpg

# 2. Remove from uploads
aws s3 rm s3://image-processor-ACCOUNT/uploads/suspicious.jpg

# 3. Tag for investigation
aws s3api put-object-tagging \
  --bucket image-processor-ACCOUNT \
  --key quarantine/suspicious.jpg \
  --tagging 'TagSet=[{Key=Status,Value=Quarantined},{Key=Date,Value=2026-02-22}]'
```

**Analysis:**
1. Download quarantine file to isolated environment
2. Scan with ClamAV or VirusTotal API
3. Check file metadata for origin
4. Review upload source IP in access logs

---

## 6. Credential Compromise

**Trigger:** CloudTrail unusual API calls, GuardDuty finding

**Immediate Actions:**
```bash
# 1. List all access keys for affected user/role
aws iam list-access-keys --user-name <username>

# 2. Deactivate compromised keys
aws iam update-access-key --access-key-id <key-id> --status Inactive

# 3. Delete compromised keys
aws iam delete-access-key --access-key-id <key-id>

# 4. Create new keys
aws iam create-access-key --user-name <username>
```

**Recovery:**
1. Audit all API calls made with compromised credentials
2. Check for unauthorized resource creation
3. Rotate all secrets that may have been exposed
4. Enable MFA if not already enabled

---

## 7. Forensics Data Collection

### 7.1 Preserve Evidence

```bash
# CloudTrail logs (last 24 hours)
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetObject \
  --start-time $(date -d '24 hours ago' -Iseconds) > forensics/cloudtrail.json

# CloudWatch Logs
aws logs create-export-task --log-group-name /aws/lambda/image-processor-proc \
  --from $(date -d '24 hours ago' +%s)000 --to $(date +%s)000 \
  --destination s3://forensics-bucket/logs/

# S3 access logs
aws s3 cp s3://image-processor-logs-ACCOUNT/s3-access-logs/ ./forensics/s3-logs/ --recursive
```

### 7.2 Chain of Custody

Document:
- [ ] Time of incident detection
- [ ] Personnel involved
- [ ] Actions taken (with timestamps)
- [ ] Evidence collected (with hashes)
- [ ] Systems affected

---

## 8. Communication Templates

### 8.1 Internal Notification

```
SECURITY INCIDENT NOTIFICATION

Incident ID: INC-YYYY-XXXX
Severity: [Critical/High/Medium/Low]
Status: [Investigating/Contained/Resolved]

Summary: [Brief description]

Impact: [Systems/data affected]

Actions Taken: [List of containment steps]

Next Update: [Time]

Contact: [Incident commander]
```

### 8.2 External Notification (if required)

```
SECURITY ADVISORY

Date: [Date]
Affected Service: AWS Image Processing

Description: [Factual, non-technical summary]

Customer Action: [If customers need to take action]

Status: [Investigating/Resolved]

Contact: security@company.com
```

---

## 9. Post-Incident

### 9.1 Required Documentation

1. Incident timeline (minute-by-minute)
2. Root cause analysis
3. Impact assessment
4. Remediation actions
5. Lessons learned

### 9.2 Follow-up Actions

| Timeframe | Action |
|-----------|--------|
| 24 hours | Initial incident report |
| 72 hours | Root cause analysis |
| 1 week | Remediation complete |
| 2 weeks | Post-incident review |
| 30 days | Security control updates |

**Review Schedule:** This runbook must be tested quarterly via tabletop exercise and updated after each incident.