Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save panchicore/bc7fc4cd491f66b0126a78b717e60062 to your computer and use it in GitHub Desktop.

Select an option

Save panchicore/bc7fc4cd491f66b0126a78b717e60062 to your computer and use it in GitHub Desktop.
Workflow finding type race condition — analysis and proposed fix

Workflow Finding Type Race Condition

Problem

When Workflow A calls Workflow B via a run_workflow node, both workflows must share the same entity_name_type (finding type). This is validated at save time and at every WorkflowManager instantiation. However, entity_name_type lives on the mutable Workflow record — not in the versioned snapshot — so it can be changed at any time via PUT /workflows/<cuid>, breaking running executions.

Root Cause

Two separate API paths update a workflow:

Operation Endpoint Versioned?
Settings (entity_name_type, trigger, title…) PUT /workflows/<cuid> No — writes directly to workflows table
Graph (nodes, edges) POST /workflows/<cuid>/<version> Yes — creates new version if active executions exist

The Version source JSON only contains {nodes, edges}. All entity context (entity_name_type, entity_name, primary_record_type_id) is read from the live Workflow record at runtime. When resuming an execution, validate_source() compares the current entity_name_type values — not the values at the time the execution started.

Data flow on resume

execution.get_workflow_manager()
  └─> WorkflowManager.__init__(workflow=self.workflow, version=self.version)
        └─> version_to_use.validate_source()
              └─> NodesManager.validate_nodes_data(
                    workflow=self.workflow,     ← CURRENT mutable record
                    nodes=version.source["nodes"] ← frozen graph
                  )
                  └─> _validate_run_workflow_node():
                        workflow.entity_name_type  (current, mutable)
                        vs target_workflow.entity_name_type (current, mutable)
                        → WorkflowValidationError if mismatch

Reproduction

Setup

  • Workflow #3 — "Model Limitation Artifact Workflow 1", finding type = Model Limitation
  • Workflow #4 — "Policy Exception Artifact Workflow 1", finding type = Policy Exception → changed to Model Limitation
  • Workflow #4 has a run_workflow node pointing to Workflow #3
  • Both workflows set to same finding type, saved successfully, execution started

Steps

  1. Start execution of Workflow #4 on a Policy Exception artifact
  2. Execution pauses at user_action node (waiting for user input)
  3. Edit Workflow #4 settings → change entity_name_type back to Policy Exception (different from Workflow #3's Model Limitation)
  4. Return to the running execution → submit the user action form

Result

WorkflowValidationError:
  Run Workflow: Incompatible entity name types.
  Current workflow has type 'cmnnax6gd002m73i5xjmwdn9q'
  but referenced workflow has type 'cmnnax6ha002q73i5cqfdkjnn'.

The execution is permanently stuck — every interaction re-triggers the same validation. Only reverting the finding type via DB or UI unblocks it.

Key Code Locations

File Lines What
src/backend/db/workflow.py 231-232 entity_name_type column on Workflow
src/backend/db/workflow.py 2977-3030 Version model — source only stores nodes/edges
src/backend/db/workflow.py 2012-2150 update_workflow — no guards for active executions
src/backend/db/workflow.py 3137-3162 validate_source — uses self.workflow (current, not snapshot)
src/backend/db/workflow.py 2549-2637 get_workflow_dependencies — finds workflows referencing this one
src/backend/workflows/managers.py 1258-1322 _validate_run_workflow_node — the check that fails
src/backend/workflows/managers.py 2163-2169 validate_source() called in WorkflowManager.__init__
src/backend/handlers/workflows_handlers.py 50-117 Handler that calls update_workflow

Proposed Solution: Guard at Settings Update Time

Add checks in Workflow.update_workflow() (src/backend/db/workflow.py:2012) when entity_name_type is changing:

Guard 1: Block if this workflow has active executions

if entity_name_type != workflow.entity_name_type:
    active_count = db.session.execute(
        select(func.count()).select_from(Execution).filter(
            Execution.workflow_id == workflow.id,
            Execution.status.in_([
                Execution.STATUS_ACTIVE,
                Execution.STATUS_WAITING,
                Execution.STATUS_SCHEDULED,
            ])
        )
    ).scalar_one()
    if active_count > 0:
        raise BadRequestError(
            "Cannot change finding type while this workflow has active executions."
        )

Guard 2: Block if other workflows reference this one via run_workflow and have active executions

get_workflow_dependencies(cuid) already returns workflows that have run_workflow nodes pointing to this workflow (used today to block deletion). Extend the check:

if entity_name_type != workflow.entity_name_type:
    deps = cls.get_workflow_dependencies(workflow.cuid)
    dependent_workflows = deps.get("dependent_workflows", [])
    if dependent_workflows:
        # Check if any dependent workflow has active executions
        dep_cuids = [dw["cuid"] for dw in dependent_workflows]
        dep_active = db.session.execute(
            select(func.count()).select_from(Execution).filter(
                Execution.workflow_id.in_(
                    select(cls.id).filter(cls.cuid.in_(dep_cuids))
                ),
                Execution.status.in_([
                    Execution.STATUS_ACTIVE,
                    Execution.STATUS_WAITING,
                    Execution.STATUS_SCHEDULED,
                ])
            )
        ).scalar_one()
        if dep_active > 0:
            raise BadRequestError(
                "Cannot change finding type: workflows referencing this one "
                "have active executions."
            )

Guard 3 (optional): Block if this workflow references other workflows with different finding type

This is already caught by validate_source() at save time for the graph. But if the referenced workflow's type changed, the parent's next settings save would not catch it (settings update doesn't call validate_source). Consider also running validate_source on the latest version during settings update when entity_name_type changes.

Why This Approach

  • Matches existing pattern: deletion already uses get_workflow_dependencies() — same guard, different mutation
  • Minimal scope: one method, two/three checks, no migration
  • Doesn't require versioning entity_name_type: that would need a migration + changes to Version model + changes to how validate_source resolves entity context
  • Closes the gap: save-time validation already works for the graph; this fix prevents the mismatch from being created after a valid save

Long-Term Consideration

Versioning entity_name_type alongside the node graph in source would make the system fundamentally resilient — executions would always use the finding type from when the version was created. But this is a larger change:

  • Schema migration to add entity_name_type to Version or source JSON
  • Changes to validate_source to read from version context instead of live Workflow
  • Migration path for existing versions without the field
  • Needs careful analysis of all places that read workflow.entity_name_type

Additional Notes

  • entity_name (Finding vs InventoryModel) is not changeable via UI (frontend restriction only, backend accepts it). Not a concern for this issue.
  • active_execution_count() only exists on Version, not Workflow — the guard needs a direct query on Execution.workflow_id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment