Overview
At CircleCI, we take Server incidents seriously and have a dedicated process to ensure rapid response and resolution when critical issues impact your operations. This guide explains what you can expect when you experience a P0 (Priority 0) incident.
What Qualifies as a P0 Incident?
A P0 incident is declared when you experience:
Fatal system failure (server down or unresponsive)
Builds not running
Outage impacting critical system operations
Security breach
Expired license preventing operations
How to Report an Incident
Submit a Zendesk ticket marking it as P0/Urgent
Provide a support bundle from your server installation
Run Reality Check if possible before submitting
Our Support team will review your ticket to confirm the severity and rule out external factors like cloud provider outages.
What Happens Next?
Step 1: Initial Response
A Support Engineer will reach out and start a Zoom call with you
They'll perform basic checks to understand the scope and impact
If engineering escalation is needed, they'll be called in
Step 2: Incident Declaration
An Incident Commander will be called in, usually a CircleCI Engineering Manager
Members of the appropriate engineering team will be added to the Zoom call as needed
Step 3: Active Response
You'll work directly with:
Support Engineer: Your primary point of contact who updates and keeps you informed
Incident Commander: Coordinates the technical response and ensures the right resources are engaged
Engineering Response Team: Engineer/s from the team that owns the affected service will join to investigate and resolve as necessary
Step 4: Resolution
The team works continuously until:
The issue is resolved, OR
We mutually agree to pause and resume at a scheduled later time
During the Incident
What you can expect:
Regular updates every 30 minutes on progress and current state
Direct access to engineering resources via Zoom
Clear communication about what's being tried and what we're learning about your situation
Coordination with your Field Engineer if necessary
What we need from you:
Access to logs, metrics, and system information
Details about recent changes to your environment
Availability of team members who can provide context or make necessary changes
After Resolution
Within 7 days of resolution, you'll receive:
A detailed Root Cause Analysis (RCA) document explaining what happened
Specific corrective actions we're taking to prevent similar incidents
Recommendations for your environment if applicable
We also conduct internal Post Incident Reviews to continuously improve our response process and product reliability.
Important Notes
Support bundles are critical: These contain the diagnostic information we need to troubleshoot quickly
Zoom calls: We'll create a recorded Zoom call for the incident response (for our internal documentation)
No after-hours delays: If your incident occurs outside business hours, our on-call team will respond
Before You Need Us
To prepare for potential incidents:
Know how to generate a support bundle from your installation
Document recent changes to your environment
Identify who on your team can authorize system changes during an incident
Questions?
If you have questions about our incident response process or want to discuss your specific environment, please reach out to your Technical Success Manager or Field Engineer.