Skip to main content

Troubleshooting concurrency issues

When limiting concurrency, you might run into issues until you get the configuration right.

Runs going to STARTED status and skipping QUEUED

info

This only applies to Dagster Open Source.

If you are running a version older than 1.10.0, you may need to manually configure your deployment to enable run queueing by setting the run_queue key in your instance's settings. In the Dagster UI, navigate to Deployment > Configuration and verify that the run_queue key is set.

Runs remaining in QUEUED status

The possible causes for runs remaining in QUEUED status depend on whether you're using Dagster+ or Dagster Open Source.

If runs aren't being dequeued in Dagster+, the root causes could be:

  • If using a Hybrid deployment, the agent serving the deployment may be down. In this situation, runs will be paused.
  • Dagster+ is experiencing downtime. For the latest on potential outages, check the Dagster+ status page.

Runs blocked by op concurrency limits from cancelled runs

If runs are stuck in QUEUED status and the daemon logs show messages like:

Run <id> is blocked by global concurrency limits:
{"my_pool": {"pending_step_count": 1, "pending_step_run_ids": ["<cancelled_run_id>"]}}

where the blocking run is in CANCELED or FAILURE status, this means a cancelled or failed run left stale concurrency slot claims that were never cleaned up. With a pool limit of 1 (common for single-writer databases like DuckDB), a single stale claim will permanently block all future runs for that concurrency key.

Cause: By default, Dagster does not automatically free op concurrency slots when a run is cancelled or fails. The cleanup mechanism exists but must be explicitly enabled.

Fix: Add free_slots_after_run_end_seconds to your run monitoring configuration:

run_monitoring:
enabled: true
free_slots_after_run_end_seconds: 300

This tells the daemon to automatically free concurrency slots held by finished runs after the specified number of seconds.

Immediate recovery: If you are currently deadlocked, you can free stale slots immediately using the Dagster UI. Navigate to Deployment > Concurrency and manually release the slots held by the cancelled run. Alternatively, you can use the dagster CLI:

dagster instance concurrency set <pool_name> <limit>