Incident Response Plan

Status: DRAFT Owner: Engineering Last Review: 2026-05-03 Applicable Standards: SOC 2 (CC7.3, CC7.4, CC7.5) / GDPR (Art. 33, Art. 34) / SEC (data breach disclosure)

1. Purpose

This document defines the incident response procedures for the Equa platform. It covers how incidents are detected, who is notified, how they are contained and remediated, and how post-incident reviews are conducted.

2. Scope

Component	In Scope	Notes
equa-server	Yes	Application-level incidents, API outages, data breaches
equa-web	Yes	Frontend availability, client-side security issues
PostgreSQL	Yes	Database incidents, data corruption, unauthorized access
AWS S3	Yes	Document storage incidents, access control breaches
Railway	Yes	Current interim hosting incidents, deployment failures, health-check regressions
Google Cloud	Yes	Legacy deploy-path and any remaining managed-service incidents
equabot-gateway	Yes	AI agent incidents, permission violations

For incidents involving the AI agent (Equanaut), also refer to the gateway-specific incident response procedures documented in the equabot-gateway repository. Agent-specific controls include rate limiting (AGENT_MAX_TOOL_CALLS_PER_MINUTE, AGENT_MAX_WRITE_OPS_PER_MINUTE, AGENT_MAX_DESTRUCTIVE_PER_HOUR) and the permission proxy that enforces user-level permissions on all agent tool calls.Source: equa-server/modules/agent/src/security/guardrails.ts

3. Incident Severity Levels

Severity	Description	Examples	Response Time
P1 — Critical	Service outage or confirmed data breach	Database compromise, production down, unauthorized data access	Immediate (within 15 minutes)
P2 — High	Degraded service or suspected security incident	Partial outage, unusual access patterns, failed deployment causing errors	Within 1 hour
P3 — Medium	Non-critical issue with potential security impact	Elevated error rates, dependency vulnerability disclosed, suspicious login activity	Within 4 hours
P4 — Low	Minor issue, no immediate security impact	Performance degradation, non-critical bug, informational security alert	Within 24 hours

4. Phase 1: Detection

4.1 Automated Detection

Mechanism	What It Detects	Current Status
Health endpoint monitoring	Railway service health and application availability	Implemented (`/health` plus managed platform checks)
Error logging	Application exceptions, unhandled errors	Implemented (application logs)
Managed platform metrics	Edge status codes, request failures, service restarts	Partially available (Railway dashboard / edge responses; legacy Google metrics still apply if that path is used)
Database health	Production PostgreSQL availability, connection pool exhaustion	Partially available (provider-specific verification still needed)
Agent guardrails	Tool call rate limit violations, unauthorized write operations	Implemented (`equa-server/modules/agent/src/security/guardrails.ts`)

4.2 Manual Detection

Source	What It Detects
User reports	Functionality issues, unexpected behavior, suspicious activity
Team observation	Unusual patterns during routine operations
Third-party notification	Vulnerability disclosure, vendor security advisory

4.3 Detection Gaps

The following detection capabilities should be implemented to improve incident identification.

Gap	Recommendation
No external uptime monitoring	Deploy a third-party uptime monitor (e.g., Better Uptime, Pingdom)
No alerting on error rate spikes	Configure managed-platform alerts for 5xx rate exceeding threshold
No authentication anomaly detection	Monitor for brute-force patterns, credential stuffing, geographic anomalies
No audit log anomaly detection	Alert on unusual admin actions, bulk data access, or privilege escalation

5. Phase 2: Notification

5.1 Internal Notification

When an incident is detected, the following notification chain is activated:

Step	Action	Responsible
1	Incident detected (automated alert or manual report)	Detection system / reporter
2	Incident logged with severity level, timestamp, and initial description	First responder
3	Incident lead assigned based on severity and type	Engineering lead
4	Notification sent to relevant team members	Incident lead
5	For P1/P2: executive stakeholders notified	Incident lead

5.2 External Notification

Scenario	Notification Required	Timeline
Confirmed data breach (PII)	Affected users, relevant supervisory authority (GDPR: within 72 hours)	GDPR Article 33: 72 hours to authority; Article 34: without undue delay to users
Confirmed data breach (financial)	Affected users, state attorneys general (per state breach notification laws)	Varies by state; typically 30—60 days
Service outage	Affected users via status page or email	As soon as impact is confirmed
Vulnerability in third-party dependency	No external notification unless exploited	Internal assessment first

6. Phase 3: Containment

6.1 Immediate Containment Actions

Action	When to Use	How
Isolate Railway service	Suspected compromised current runtime	Redeploy or restart the affected Railway service, or route traffic away from the unhealthy edge
Revoke sessions	Suspected credential compromise	Truncate the sessions table or invalidate specific user sessions via `equa-server/modules/auth/src/sessions.ts`
Disable user account	Confirmed compromised account	Set `Users.enabled = false`; destroy all active sessions
Block IP range	Active attack from identifiable source	Configure the active CDN/WAF/firewall layer for the affected host
Rotate secrets	Suspected secret exposure	Rotate `API_SESSION_SECRET`, `TWO_FACTOR_PRIVATE_KEY`, database credentials, OAuth secrets; redeploy
Enable maintenance mode	Widespread compromise requiring investigation	Deploy a static maintenance page; stop processing requests
Disable agent	AI agent acting outside expected parameters	Revoke agent permissions via permission proxy; disable agent tool access

6.2 Managed Platform Containment

The current interim stack is hosted on managed platforms, so containment depends on the platform serving the affected host:

Railway service restart / redeploy — the current app/API edge can be restarted or redeployed quickly when app.equa.cc is unhealthy
Health-check gate — failed /health checks and bad edge responses help confirm whether the fault is inside the app container or at the host edge
Service disable — the affected managed service can be stopped if necessary
Legacy Cloud Run revision routing — if the legacy Cloud Run path is reactivated for a backend incident, revision-based traffic shifting is still available there

6.3 Database Containment

Read-only or restricted-write mode — use the managed PostgreSQL provider controls that are available for the live database service
Point-in-time recovery — use managed PostgreSQL backup/restore controls if they are enabled for the live provider
Connection kill — Active database connections can be terminated to stop ongoing unauthorized queries

7. Phase 4: Remediation

7.1 Root Cause Analysis

Collect evidence — Preserve logs, database snapshots, and affected container images before any remediation
Timeline reconstruction — Build a chronological timeline of the incident from first indicator to detection
Attack vector identification — Determine how the incident occurred (vulnerability, misconfiguration, credential compromise, etc.)
Impact assessment — Identify all affected data, users, and systems

7.2 Remediation Actions

Category	Actions
Code fix	Patch the vulnerability, deploy via normal CI/CD pipeline with expedited review
Configuration fix	Update infrastructure configuration (IAM, firewall, Railway settings, or legacy Cloud Run settings)
Credential rotation	Rotate all potentially compromised credentials and secrets
Data restoration	Restore from backup if data was corrupted or deleted
User notification	Notify affected users with clear description of impact and actions taken
Monitoring enhancement	Add detection rules to catch similar incidents in the future

7.3 Verification

Before declaring the incident resolved:

Deploy the fix to a staging environment and verify
Deploy to production
Monitor for recurrence (minimum 24 hours for P1/P2)
Confirm all containment measures have been reversed (or intentionally kept)
Verify affected systems are operating normally

8. Phase 5: Post-Incident Review

8.1 Timeline

Severity	Review Deadline
P1 — Critical	Within 48 hours of resolution
P2 — High	Within 5 business days
P3 — Medium	Within 10 business days
P4 — Low	Monthly batch review

8.2 Post-Incident Report Template

Incident Report: [INCIDENT-YYYY-MM-DD-NNN]

Summary:        [One-sentence description]
Severity:       [P1/P2/P3/P4]
Duration:       [Detection time to resolution time]
Impact:         [Users affected, data affected, service degradation]

Timeline:
  [timestamp] — [event description]
  [timestamp] — [event description]
  ...

Root Cause:     [Technical description of what went wrong]
Contributing Factors: [Process, tooling, or organizational factors]

Remediation:
  - [Action taken]
  - [Action taken]

Prevention:
  - [Improvement to prevent recurrence]
  - [Improvement to detect earlier]
  - [Improvement to contain faster]

Action Items:
  - [ ] [Specific task] — Owner: [name] — Due: [date]
  - [ ] [Specific task] — Owner: [name] — Due: [date]

8.3 Blameless Culture

Post-incident reviews focus on systemic improvements, not individual fault. The goal is to understand what happened, why existing controls failed to prevent or detect it, and what changes will reduce the likelihood and impact of similar incidents.

9. Roles and Responsibilities

Role	Responsibilities
Incident Lead	Coordinates response, makes containment decisions, owns communication
Engineering Responder	Investigates technical root cause, implements fixes
Communications Lead	Drafts user notifications, updates status page, handles external inquiries
Executive Sponsor	Approves external communications for P1/P2, allocates resources

10. Annual Review

This incident response plan is reviewed and updated:

Annually as part of the security program review
After every P1/P2 incident to incorporate lessons learned
When infrastructure changes that affect detection or containment capabilities

11. Cross-References

Topic	Document
Security controls and encryption	Security Architecture
Access control and permission model	Access Control Model
Audit logging and event tracking	Audit Trail Design
Data breach notification (GDPR)	Data Privacy and GDPR
Evidence retention after incidents	Data Retention Policy

12. Regulatory References

Standard	Requirement	Current Status
SOC 2 CC7.3	Evaluate security events to determine if they are incidents	Implemented — severity classification defined (P1—P4)
SOC 2 CC7.4	Respond to identified security incidents	Documented — containment and remediation procedures defined
SOC 2 CC7.5	Identify the cause of incidents and take corrective action	Documented — post-incident review process with root cause analysis
GDPR Art. 33	Notification of personal data breach to supervisory authority within 72 hours	Documented — external notification timeline defined
GDPR Art. 34	Communication of personal data breach to data subjects	Documented — user notification procedures defined
SEC	Material cybersecurity incident disclosure (Form 8-K for public companies)	Noted — Equa currently serves private companies; relevant if customers have public parent entities

13. Revision History

Date	Version	Author	Changes
2026-02-21	0.1	Agent (Phase 5 Session A)	Initial draft
2026-02-21	0.2	Agent (Phase 5 Session B)	Template alignment (status header, scope table, numbered sections), agent incident cross-reference, regulatory references table, cross-references to other compliance docs

​Incident Response Plan

​1. Purpose

​2. Scope

​3. Incident Severity Levels

​4. Phase 1: Detection

​4.1 Automated Detection

​4.2 Manual Detection

​4.3 Detection Gaps

​5. Phase 2: Notification

​5.1 Internal Notification

​5.2 External Notification

​6. Phase 3: Containment

​6.1 Immediate Containment Actions

​6.2 Managed Platform Containment

​6.3 Database Containment

​7. Phase 4: Remediation

​7.1 Root Cause Analysis

​7.2 Remediation Actions

​7.3 Verification

​8. Phase 5: Post-Incident Review

​8.1 Timeline

​8.2 Post-Incident Report Template

​8.3 Blameless Culture

​9. Roles and Responsibilities

​10. Annual Review

​11. Cross-References

​12. Regulatory References

​13. Revision History

Incident Response Plan

1. Purpose

2. Scope

3. Incident Severity Levels

4. Phase 1: Detection

4.1 Automated Detection

4.2 Manual Detection

4.3 Detection Gaps

5. Phase 2: Notification

5.1 Internal Notification

5.2 External Notification

6. Phase 3: Containment

6.1 Immediate Containment Actions

6.2 Managed Platform Containment

6.3 Database Containment

7. Phase 4: Remediation

7.1 Root Cause Analysis

7.2 Remediation Actions

7.3 Verification

8. Phase 5: Post-Incident Review

8.1 Timeline

8.2 Post-Incident Report Template

8.3 Blameless Culture

9. Roles and Responsibilities

10. Annual Review

11. Cross-References

12. Regulatory References

13. Revision History