Overview

At CircleCI, we take Server incidents seriously and have a dedicated process to ensure rapid response and resolution when critical issues impact your operations. This guide explains what you can expect when you experience a P0 (Priority 0) incident.

What Qualifies as a P0 Incident?

A P0 incident is declared when you experience:

Fatal system failure (server down or unresponsive)
Builds not running
Outage impacting critical system operations
Security breach
Expired license preventing operations

How to Report an Incident

Submit a Zendesk ticket marking it as P0/Urgent
Provide a support bundle from your server installation
Run Reality Check if possible before submitting

Our Support team will review your ticket to confirm the severity and rule out external factors like cloud provider outages.

What Happens Next?

Step 1: Initial Response

A Support Engineer will reach out and start a Zoom call with you
They'll perform basic checks to understand the scope and impact
If engineering escalation is needed, they'll be called in

Step 2: Incident Declaration

An Incident Commander will be called in, usually a CircleCI Engineering Manager
Members of the appropriate engineering team will be added to the Zoom call as needed

Step 3: Active Response

You'll work directly with:

Support Engineer: Your primary point of contact who updates and keeps you informed
Incident Commander: Coordinates the technical response and ensures the right resources are engaged
Engineering Response Team: Engineer/s from the team that owns the affected service will join to investigate and resolve as necessary

Step 4: Resolution

The team works continuously until:

The issue is resolved, OR
We mutually agree to pause and resume at a scheduled later time

During the Incident

What you can expect:

Regular updates every 30 minutes on progress and current state
Direct access to engineering resources via Zoom
Clear communication about what's being tried and what we're learning about your situation
Coordination with your Field Engineer if necessary

What we need from you:

Access to logs, metrics, and system information
Details about recent changes to your environment
Availability of team members who can provide context or make necessary changes

After Resolution

Within 7 days of resolution, you'll receive:

A detailed Root Cause Analysis (RCA) document explaining what happened
Specific corrective actions we're taking to prevent similar incidents
Recommendations for your environment if applicable

We also conduct internal Post Incident Reviews to continuously improve our response process and product reliability.

Important Notes

Support bundles are critical: These contain the diagnostic information we need to troubleshoot quickly
Zoom calls: We'll create a recorded Zoom call for the incident response (for our internal documentation)
No after-hours delays: If your incident occurs outside business hours, our on-call team will respond

Before You Need Us

To prepare for potential incidents:

Know how to generate a support bundle from your installation
Document recent changes to your environment
Identify who on your team can authorize system changes during an incident

Questions?

If you have questions about our incident response process or want to discuss your specific environment, please reach out to your Technical Success Manager or Field Engineer.

Is CircleCI having issues or is it something on my end?

[SERVER] CircleCI Server Diagnostic Guide: Collecting Critical Information

Bitnami Legacy Repository Migration & CircleCI Server

[COUPANG] Server Diagnostic Guide: Collecting Critical Information

What are the basics of CircleCI?