Skip to main content

Troubleshooting Self-Hosted Runners (Machine Runner & Container Runner)

Overview

This article is a central troubleshooting reference for both runner types:

  • Machine Runner 3.x — an agent installed directly on a VM or physical machine (Linux, macOS, Windows)

  • Container Runner — a Helm-deployed agent that schedules jobs as pods in a Kubernetes cluster

If you are still using Launch Agent 1.x, stop here and migrate first — see Issue 4: Launch Agent 1.x jobs are failing (EOL) below.


Quick Pre-Checks

Before diving into specific issues, confirm the following:

Check

How

Runner is registered and visible

Org Settings → Self-Hosted Runners → confirm resource class appears and shows a runner

Runner version

circleci-runner --version (machine runner) or helm list -n <namespace> (container runner)

Resource class name in config matches exactly

Names are case-sensitive: my-org/my-runnermy-org/My-Runner

Runner has outbound internet access to runner.circleci.com

Port 443 required

Runner token is valid and not rotated

If token was recently rotated, restart the runner process with the new token


Issue 1: "We cannot run this job using the selected resource class"

Symptom: The job fails immediately with:

We cannot run this job using the selected resource class.

Cause A — Resource class does not exist

Verify the resource class was created:

circleci runner resource-class list <your-namespace>

If missing, create it:

circleci runner resource-class create <your-namespace>/<resource-class-name> "description"

Cause B — Runner is not enabled for your plan

Self-hosted runners require a Scale, Custom, or Server plan. Performance and Free plans do not have access. Check at Org Settings → Plan.

Cause C — Typo in config.yml

The resource class in your config must exactly match what was created. Check for capitalization differences, leading/trailing spaces, or namespace mismatches:

# Must match the registered resource class exactly
resource_class: my-org/my-runner-name

Issue 2: Jobs Queued or Stuck in "Not Running" / "Preparing Environment"

Check 1 — Confirm at least one runner is online

Go to Org Settings → Self-Hosted Runners. If the resource class shows "No runners" or all runners appear offline, the runner process has stopped or lost connectivity.

Check 2 — Review maxConcurrentTasks

Each resource class has a maxConcurrentTasks limit (default: 20). If this limit is reached, additional jobs queue even if runner machines appear idle. Contact CircleCI Support to request an increase.

Check 3 — Inspect runner logs

See Runner Log File Locations below. Look for:

  • failed to claim task — runner cannot reach the CircleCI backend

  • context deadline exceeded — network timeout to runner.circleci.com

  • token is invalid — runner token was rotated; restart the runner with the new token

Check 4 — For container runner, check pod status

kubectl get pods -n <namespace>kubectl logs deployment/container-agent -n <namespace>

If the container-agent pod is not in Running state, see Issues 5 and 6 below.


Issue 3: Runner Appears Online but Jobs Are Not Being Claimed

Cause A — Runner is at maxConcurrentTasks capacity

If a previous batch of jobs did not release cleanly (e.g., machine rebooted mid-job), tasks may still be counted as active in the backend. Contact Support to clear stuck task claims.

Cause B — Runner cannot reach the task assignment endpoint

The runner must be able to reach:

  • runner.circleci.com:443

  • *.circle-artifacts.com (for artifact and cache operations)

Test from the runner machine:

curl -I https://runner.circleci.com/api/v3/runner/unclaim

Cause C — Clock skew on the runner machine

TLS certificate validation requires the system clock to be within a few minutes of actual time. If the clock is skewed, authentication will fail silently. Verify NTP is configured and the clock is accurate (timedatectl on Linux).


Issue 4: Launch Agent 1.x Jobs Are Failing (EOL)

Support for Launch Agent 1.x ended on September 17, 2024. Any runner still running a 1.x version will fail.

Symptoms:

  • Jobs fail immediately with no useful error in the job output

  • Runner logs show authentication or connection errors with no clear cause

Action required: Migrate to Machine Runner 3.x

The migration is straightforward — the configuration file is 1:1 compatible. No config changes are required.

# macOS (Homebrew)
brew install circleci-runner# Linux (Debian/Ubuntu)
apt install circleci-runner# Linux (RHEL/CentOS)
yum install circleci-runner

After installing, your existing config file (launch-agent-config.yaml) works without modification:

circleci-runner start --config launch-agent-config.yaml


Issue 5: Container Runner — Jobs Stuck in "Task Lifecycle" Stage (K8s Throttling)

Symptom: Jobs hang in the "Task lifecycle" stage. Container-agent logs show:

waited for 3s due to client-side throttling, not priority and fairness, request: ...

Cause: The single container-agent pod is saturating the Kubernetes API rate limits under high task concurrency.

Fix: Increase the replica count in values.yaml:

agent:
replicaCount: 2

Apply the change:

helm upgrade container-agent container-agent/container-agent -n <namespace> -f values.yaml


Issue 6: Container Runner — Pods Remain in "Pending" State

Cause

How to check

Node out of memory (OOM)

kubectl describe node <node-name> — look for MemoryPressure: True

Node disk pressure

kubectl describe node <node-name> — look for DiskPressure: True

No nodes match pod affinity/tolerations

kubectl describe pod <task-pod-name> -n <namespace> — look for Unschedulable events

Image pull failure

kubectl describe pod <task-pod-name> — look for ImagePullBackOff or ErrImagePull

For image pull issues with a private registry, see How to use imagePullSecrets on Container Runner.


Issue 7: OIDC Tokens Not Available in Runner Jobs

Symptom: $CIRCLE_OIDC_TOKEN is empty or the job fails when trying to use it.

Cause: OIDC token generation writes a file to /tmp. If /tmp is mounted with the noexec flag (common in hardened environments), this fails silently.

Diagnose:

mount | grep /tmp
# Look for "noexec" in the output

Fix options:

  1. Remove the noexec flag from /tmp if your security policy permits.

  2. Configure the runner to use an alternative working directory that allows execution.

  3. Use a native credential mechanism (AWS IAM instance profiles, GCP Workload Identity) instead of OIDC on that runner.


Issue 8: "fork/exec /bin/bash: bad file descriptor" (Container Runner)

Symptom:

failed to start cmd: fork/exec /bin/bash: bad file descriptor

Cause: The job's Docker image does not have /bin/bash, or the image entrypoint conflicts with the runner's task agent.

Fix:

  1. Ensure the image includes bash (RUN apt-get install -y bash), or use an image that includes it.

  2. Explicitly set the shell in your job config:

jobs:
  my-job:
    shell: /bin/sh -eo pipefail

Issue 9: SSH Debugging Not Working on Self-Hosted Runners

Container Runner does not support SSH debugging. This is a current product limitation — "Rerun job with SSH" is not available for container runner jobs.

Machine Runner does support SSH reruns. If it's not working, verify:

  • Project Settings → Advanced → Enable SSH reruns is turned on

  • The runner machine is network-accessible from your IP on the SSH port


Runner Log File Locations

Machine Runner 3.x

OS

Log location

Linux (systemd)

journalctl -u circleci-runner -f

Linux (file)

/var/log/circleci-runner/circleci-runner.log

macOS

~/Library/Logs/com.circleci.runner/circleci-runner.log

Windows

C:\ProgramData\CircleCI\circleci-runner.log

To increase log verbosity, set log_level: debug in the runner config file and restart the service.

Container Runner

# Container agent logs
kubectl logs deployment/container-agent -n <namespace> --tail=200# Logs for a specific task pod
kubectl logs <task-pod-name> -n <namespace># Events (most useful for Pending pods)
kubectl describe pod <task-pod-name> -n <namespace>

When Escalating to Support

Include the following in your ticket to avoid back-and-forth:

  • Runner type: Machine Runner or Container Runner

  • Runner version:circleci-runner --versionor Helm chart version (helm list -n <namespace>)

  • Resource class name exactly as it appears inconfig.yml

  • OS and version (machine runner) or Kubernetes version and cloud provider (container runner)

  • Runner logs from the time window of the failure

  • The specific failing job URL fromapp.circleci.com

  • Output ofcircleci runner resource-class list <namespace>

  • Whether the issue is intermittent or consistent


Additional Resources

Did this answer your question?