Skip to main content

Incident Response Plan

Status: DRAFT Owner: Engineering Last Review: 2026-05-03 Applicable Standards: SOC 2 (CC7.3, CC7.4, CC7.5) / GDPR (Art. 33, Art. 34) / SEC (data breach disclosure)

1. Purpose

This document defines the incident response procedures for the Equa platform. It covers how incidents are detected, who is notified, how they are contained and remediated, and how post-incident reviews are conducted.

2. Scope

ComponentIn ScopeNotes
equa-serverYesApplication-level incidents, API outages, data breaches
equa-webYesFrontend availability, client-side security issues
PostgreSQLYesDatabase incidents, data corruption, unauthorized access
AWS S3YesDocument storage incidents, access control breaches
RailwayYesCurrent interim hosting incidents, deployment failures, health-check regressions
Google CloudYesLegacy deploy-path and any remaining managed-service incidents
equabot-gatewayYesAI agent incidents, permission violations
For incidents involving the AI agent (Equanaut), also refer to the gateway-specific incident response procedures documented in the equabot-gateway repository. Agent-specific controls include rate limiting (AGENT_MAX_TOOL_CALLS_PER_MINUTE, AGENT_MAX_WRITE_OPS_PER_MINUTE, AGENT_MAX_DESTRUCTIVE_PER_HOUR) and the permission proxy that enforces user-level permissions on all agent tool calls.Source: equa-server/modules/agent/src/security/guardrails.ts

3. Incident Severity Levels

SeverityDescriptionExamplesResponse Time
P1 — CriticalService outage or confirmed data breachDatabase compromise, production down, unauthorized data accessImmediate (within 15 minutes)
P2 — HighDegraded service or suspected security incidentPartial outage, unusual access patterns, failed deployment causing errorsWithin 1 hour
P3 — MediumNon-critical issue with potential security impactElevated error rates, dependency vulnerability disclosed, suspicious login activityWithin 4 hours
P4 — LowMinor issue, no immediate security impactPerformance degradation, non-critical bug, informational security alertWithin 24 hours

4. Phase 1: Detection

4.1 Automated Detection

MechanismWhat It DetectsCurrent Status
Health endpoint monitoringRailway service health and application availabilityImplemented (/health plus managed platform checks)
Error loggingApplication exceptions, unhandled errorsImplemented (application logs)
Managed platform metricsEdge status codes, request failures, service restartsPartially available (Railway dashboard / edge responses; legacy Google metrics still apply if that path is used)
Database healthProduction PostgreSQL availability, connection pool exhaustionPartially available (provider-specific verification still needed)
Agent guardrailsTool call rate limit violations, unauthorized write operationsImplemented (equa-server/modules/agent/src/security/guardrails.ts)

4.2 Manual Detection

SourceWhat It Detects
User reportsFunctionality issues, unexpected behavior, suspicious activity
Team observationUnusual patterns during routine operations
Third-party notificationVulnerability disclosure, vendor security advisory

4.3 Detection Gaps

The following detection capabilities should be implemented to improve incident identification.
GapRecommendation
No external uptime monitoringDeploy a third-party uptime monitor (e.g., Better Uptime, Pingdom)
No alerting on error rate spikesConfigure managed-platform alerts for 5xx rate exceeding threshold
No authentication anomaly detectionMonitor for brute-force patterns, credential stuffing, geographic anomalies
No audit log anomaly detectionAlert on unusual admin actions, bulk data access, or privilege escalation

5. Phase 2: Notification

5.1 Internal Notification

When an incident is detected, the following notification chain is activated:
StepActionResponsible
1Incident detected (automated alert or manual report)Detection system / reporter
2Incident logged with severity level, timestamp, and initial descriptionFirst responder
3Incident lead assigned based on severity and typeEngineering lead
4Notification sent to relevant team membersIncident lead
5For P1/P2: executive stakeholders notifiedIncident lead

5.2 External Notification

ScenarioNotification RequiredTimeline
Confirmed data breach (PII)Affected users, relevant supervisory authority (GDPR: within 72 hours)GDPR Article 33: 72 hours to authority; Article 34: without undue delay to users
Confirmed data breach (financial)Affected users, state attorneys general (per state breach notification laws)Varies by state; typically 30—60 days
Service outageAffected users via status page or emailAs soon as impact is confirmed
Vulnerability in third-party dependencyNo external notification unless exploitedInternal assessment first

6. Phase 3: Containment

6.1 Immediate Containment Actions

ActionWhen to UseHow
Isolate Railway serviceSuspected compromised current runtimeRedeploy or restart the affected Railway service, or route traffic away from the unhealthy edge
Revoke sessionsSuspected credential compromiseTruncate the sessions table or invalidate specific user sessions via equa-server/modules/auth/src/sessions.ts
Disable user accountConfirmed compromised accountSet Users.enabled = false; destroy all active sessions
Block IP rangeActive attack from identifiable sourceConfigure the active CDN/WAF/firewall layer for the affected host
Rotate secretsSuspected secret exposureRotate API_SESSION_SECRET, TWO_FACTOR_PRIVATE_KEY, database credentials, OAuth secrets; redeploy
Enable maintenance modeWidespread compromise requiring investigationDeploy a static maintenance page; stop processing requests
Disable agentAI agent acting outside expected parametersRevoke agent permissions via permission proxy; disable agent tool access

6.2 Managed Platform Containment

The current interim stack is hosted on managed platforms, so containment depends on the platform serving the affected host:
  • Railway service restart / redeploy — the current app/API edge can be restarted or redeployed quickly when app.equa.cc is unhealthy
  • Health-check gate — failed /health checks and bad edge responses help confirm whether the fault is inside the app container or at the host edge
  • Service disable — the affected managed service can be stopped if necessary
  • Legacy Cloud Run revision routing — if the legacy Cloud Run path is reactivated for a backend incident, revision-based traffic shifting is still available there

6.3 Database Containment

  • Read-only or restricted-write mode — use the managed PostgreSQL provider controls that are available for the live database service
  • Point-in-time recovery — use managed PostgreSQL backup/restore controls if they are enabled for the live provider
  • Connection kill — Active database connections can be terminated to stop ongoing unauthorized queries

7. Phase 4: Remediation

7.1 Root Cause Analysis

  1. Collect evidence — Preserve logs, database snapshots, and affected container images before any remediation
  2. Timeline reconstruction — Build a chronological timeline of the incident from first indicator to detection
  3. Attack vector identification — Determine how the incident occurred (vulnerability, misconfiguration, credential compromise, etc.)
  4. Impact assessment — Identify all affected data, users, and systems

7.2 Remediation Actions

CategoryActions
Code fixPatch the vulnerability, deploy via normal CI/CD pipeline with expedited review
Configuration fixUpdate infrastructure configuration (IAM, firewall, Railway settings, or legacy Cloud Run settings)
Credential rotationRotate all potentially compromised credentials and secrets
Data restorationRestore from backup if data was corrupted or deleted
User notificationNotify affected users with clear description of impact and actions taken
Monitoring enhancementAdd detection rules to catch similar incidents in the future

7.3 Verification

Before declaring the incident resolved:
  1. Deploy the fix to a staging environment and verify
  2. Deploy to production
  3. Monitor for recurrence (minimum 24 hours for P1/P2)
  4. Confirm all containment measures have been reversed (or intentionally kept)
  5. Verify affected systems are operating normally

8. Phase 5: Post-Incident Review

8.1 Timeline

SeverityReview Deadline
P1 — CriticalWithin 48 hours of resolution
P2 — HighWithin 5 business days
P3 — MediumWithin 10 business days
P4 — LowMonthly batch review

8.2 Post-Incident Report Template

Incident Report: [INCIDENT-YYYY-MM-DD-NNN]

Summary:        [One-sentence description]
Severity:       [P1/P2/P3/P4]
Duration:       [Detection time to resolution time]
Impact:         [Users affected, data affected, service degradation]

Timeline:
  [timestamp] — [event description]
  [timestamp] — [event description]
  ...

Root Cause:     [Technical description of what went wrong]
Contributing Factors: [Process, tooling, or organizational factors]

Remediation:
  - [Action taken]
  - [Action taken]

Prevention:
  - [Improvement to prevent recurrence]
  - [Improvement to detect earlier]
  - [Improvement to contain faster]

Action Items:
  - [ ] [Specific task] — Owner: [name] — Due: [date]
  - [ ] [Specific task] — Owner: [name] — Due: [date]

8.3 Blameless Culture

Post-incident reviews focus on systemic improvements, not individual fault. The goal is to understand what happened, why existing controls failed to prevent or detect it, and what changes will reduce the likelihood and impact of similar incidents.

9. Roles and Responsibilities

RoleResponsibilities
Incident LeadCoordinates response, makes containment decisions, owns communication
Engineering ResponderInvestigates technical root cause, implements fixes
Communications LeadDrafts user notifications, updates status page, handles external inquiries
Executive SponsorApproves external communications for P1/P2, allocates resources

10. Annual Review

This incident response plan is reviewed and updated:
  • Annually as part of the security program review
  • After every P1/P2 incident to incorporate lessons learned
  • When infrastructure changes that affect detection or containment capabilities

11. Cross-References

TopicDocument
Security controls and encryptionSecurity Architecture
Access control and permission modelAccess Control Model
Audit logging and event trackingAudit Trail Design
Data breach notification (GDPR)Data Privacy and GDPR
Evidence retention after incidentsData Retention Policy

12. Regulatory References

StandardRequirementCurrent Status
SOC 2 CC7.3Evaluate security events to determine if they are incidentsImplemented — severity classification defined (P1—P4)
SOC 2 CC7.4Respond to identified security incidentsDocumented — containment and remediation procedures defined
SOC 2 CC7.5Identify the cause of incidents and take corrective actionDocumented — post-incident review process with root cause analysis
GDPR Art. 33Notification of personal data breach to supervisory authority within 72 hoursDocumented — external notification timeline defined
GDPR Art. 34Communication of personal data breach to data subjectsDocumented — user notification procedures defined
SECMaterial cybersecurity incident disclosure (Form 8-K for public companies)Noted — Equa currently serves private companies; relevant if customers have public parent entities

13. Revision History

DateVersionAuthorChanges
2026-02-210.1Agent (Phase 5 Session A)Initial draft
2026-02-210.2Agent (Phase 5 Session B)Template alignment (status header, scope table, numbered sections), agent incident cross-reference, regulatory references table, cross-references to other compliance docs