jmchilton/WORKFLOW_EXTRACTION_VUE_CONVERSION_PLAN.md

Created February 2, 2026 16:13

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jmchilton/cf98d536a60bf394127fa3464fab0f92.js"></script>
Save jmchilton/cf98d536a60bf394127fa3464fab0f92 to your computer and use it in GitHub Desktop.

Download ZIP

Vue Conversion Plan for Workflow Extraction (Issue #17506)

Raw

WORKFLOW_EXTRACTION_VUE_CONVERSION_PLAN.md

Workflow Extraction Vue Conversion Plan

Overview

Convert the legacy Mako-based workflow extraction UI (build_from_current_history.mako) to a modern Vue.js component with FastAPI backend. This is the last non-data-display Mako template in Galaxy.

Related Issue: #17506

Code Review Summary

This plan was reviewed for Galaxy coding standards. Key findings:

Critical Issues (Fixed Below)

Issue	Location	Fix
Missing CBV wrapper	API endpoint	Need `@router.cbv` class, not standalone function
Missing error responses	API endpoint	Add 403/404 response documentation
No error handling	Service layer	Add try/except with proper exceptions
Wrong test base class	Test code	Use `ApiTestCase` not `BaseWorkflowPopulator`

High Priority Issues (Fixed Below)

Issue	Location	Fix
Type annotation gaps	Service layer	Add return types, parameter types
Use `isinstance()`	Service layer	Replace `getattr(job, 'is_fake', False)` pattern
Test coverage incomplete	Test code	Expand from 4 tests to ~15 tests

Medium Priority Recommendations

Issue	Recommendation
Long service method	Extract `_build_job_response()` helper
Large history performance	Consider pagination (see Unresolved Questions)
Job ID encoding inconsistent	Always use `trans.security.encode_id()`

Phase 1: Backend API

1.1 Pydantic Response Models

File: lib/galaxy/schema/workflow_extraction.py (new file)

from typing import Optional
from pydantic import BaseModel, Field


class WorkflowExtractionOutputDataset(BaseModel):
    """Represents a dataset or collection output from a job."""
    id: str = Field(..., description="Encoded ID of the dataset/collection")
    hid: int = Field(..., description="History ID number")
    name: str = Field(..., description="Display name")
    state: str = Field(..., description="Current state (ok, running, etc.)")
    deleted: bool = Field(..., description="Whether this output has been deleted")
    history_content_type: str = Field(..., description="'dataset' or 'dataset_collection'")


class WorkflowExtractionJob(BaseModel):
    """Represents a job available for workflow extraction."""
    id: str = Field(..., description="Encoded job ID (may be 'fake_*' for input datasets)")
    tool_name: str = Field(..., description="Human-readable tool name")
    tool_id: Optional[str] = Field(None, description="Tool ID (None for fake jobs)")
    tool_version: Optional[str] = Field(None, description="Tool version used")
    is_workflow_compatible: bool = Field(..., description="Whether tool can be used in workflows")
    is_fake: bool = Field(..., description="True for input datasets with no creating job")
    disabled_reason: Optional[str] = Field(None, description="Why this job cannot be included")
    outputs: list[WorkflowExtractionOutputDataset] = Field(
        default_factory=list,
        description="Datasets/collections created by this job"
    )
    version_warning: Optional[str] = Field(
        None,
        description="Warning if current tool version differs from job's version"
    )
    has_non_deleted_outputs: bool = Field(
        ...,
        description="True if at least one output is not deleted"
    )


class WorkflowExtractionSummary(BaseModel):
    """Summary data for workflow extraction UI."""
    history_id: str = Field(..., description="Encoded history ID")
    history_name: str = Field(..., description="History name")
    jobs: list[WorkflowExtractionJob] = Field(
        default_factory=list,
        description="Jobs available for extraction, ordered by output HID"
    )
    warnings: list[str] = Field(
        default_factory=list,
        description="Global warnings (e.g., 'Some datasets still queued')"
    )
    default_workflow_name: str = Field(
        ...,
        description="Suggested default workflow name"
    )

1.2 API Endpoint

File: lib/galaxy/webapps/galaxy/api/histories.py (add to FastAPIHistories class)

Review Note: Galaxy uses class-based views (CBV) with @router.cbv. The endpoint must be a method within the existing FastAPIHistories class, not a standalone function. Also need error response documentation.

from galaxy.schema.workflow_extraction import WorkflowExtractionSummary

# Add to existing FastAPIHistories class (already decorated with @router.cbv):

@router.get(
    "/api/histories/{history_id}/extraction_summary",
    summary="Get workflow extraction summary for a history.",
    description="Returns the jobs and datasets available for workflow extraction from this history.",
    responses={
        403: {"description": "Not authorized to access this history"},
        404: {"description": "History not found"},
    },
)
def extraction_summary(
    self,
    history_id: HistoryIDPathParam,
    trans: ProvidesHistoryContext = DependsOnTrans,
) -> WorkflowExtractionSummary:
    """Get summary data needed to build a workflow from history contents."""
    return self.service.get_extraction_summary(trans, history_id)

Note: This method goes inside the existing FastAPIHistories class which is already wrapped with @router.cbv. Galaxy auto-discovers routers - no manual registration needed.

1.3 Service Layer

File: lib/galaxy/webapps/galaxy/services/histories.py (add method)

Review Notes:

Use isinstance() for FakeJob detection (not getattr(job, 'is_fake'))

Add error handling with proper Galaxy exceptions

Add type annotations for all parameters and return types

Extract helper method for job transformation

from typing import Union

from galaxy.exceptions import ObjectNotFound, ItemAccessibilityException
from galaxy.model import Job
from galaxy.workflow.extract import summarize, FakeJob, DatasetCollectionCreationJob
from galaxy.schema.workflow_extraction import (
    WorkflowExtractionSummary,
    WorkflowExtractionJob,
    WorkflowExtractionOutputDataset,
)

# Add to HistoriesService class:

def get_extraction_summary(
    self,
    trans: ProvidesHistoryContext,
    history_id: DecodedDatabaseIdField,
) -> WorkflowExtractionSummary:
    """Build extraction summary for workflow creation from history.

    Raises:
        ObjectNotFound: If history does not exist
        ItemAccessibilityException: If user cannot access history
    """
    try:
        history = self.manager.get_accessible(history_id, trans.user, current_history=trans.history)
    except ObjectNotFound:
        raise ObjectNotFound(f"History {history_id} not found")
    except ItemAccessibilityException:
        raise ItemAccessibilityException(f"Cannot access history {history_id}")

    # Use existing summarize() function
    jobs_dict, warnings = summarize(trans, history)

    # Transform to API response format
    extraction_jobs = [
        self._build_extraction_job(trans, job, datasets)
        for job, datasets in jobs_dict.items()
    ]

    return WorkflowExtractionSummary(
        history_id=trans.security.encode_id(history.id),
        history_name=history.name,
        jobs=extraction_jobs,
        warnings=list(warnings),
        default_workflow_name=f"Workflow constructed from history '{history.name}'",
    )

def _build_extraction_job(
    self,
    trans: ProvidesHistoryContext,
    job: Union[Job, FakeJob, DatasetCollectionCreationJob],
    datasets: list,
) -> WorkflowExtractionJob:
    """Transform a job and its outputs into an API response object."""
    is_fake = isinstance(job, (FakeJob, DatasetCollectionCreationJob))

    # Determine tool info
    tool_name = "Unknown"
    tool_id = None
    tool_version = None
    is_workflow_compatible = False
    disabled_reason = None
    version_warning = None

    if is_fake:
        tool_name = job.name if hasattr(job, 'name') and job.name else "Input Dataset"
        disabled_reason = job.disabled_why if hasattr(job, 'disabled_why') else "This item cannot be used in workflows"
    else:
        tool = trans.app.toolbox.get_tool(job.tool_id, tool_version=job.tool_version)
        tool_id = job.tool_id
        tool_version = job.tool_version
        if tool:
            tool_name = tool.name
            is_workflow_compatible = tool.is_workflow_compatible
            if not is_workflow_compatible:
                disabled_reason = "This tool cannot be used in workflows"
            elif tool.version != job.tool_version:
                version_warning = (
                    f'Dataset was created with tool version "{job.tool_version}", '
                    f'but workflow extraction will use version "{tool.version}".'
                )
        else:
            disabled_reason = "Tool not found"

    # Build output list
    outputs = []
    for output_name, data in datasets:
        outputs.append(WorkflowExtractionOutputDataset(
            id=trans.security.encode_id(data.id),
            hid=data.hid,
            name=data.display_name() if hasattr(data, 'display_name') else data.name,
            state=data.state or "queued",
            deleted=data.deleted,
            history_content_type=data.history_content_type,
        ))

    has_non_deleted = any(not o.deleted for o in outputs)

    # Encode job ID consistently
    if is_fake:
        job_id = str(job.id)  # FakeJob IDs are already strings like "fake_123"
    else:
        job_id = trans.security.encode_id(job.id)

    return WorkflowExtractionJob(
        id=job_id,
        tool_name=tool_name,
        tool_id=tool_id,
        tool_version=tool_version,
        is_workflow_compatible=is_workflow_compatible and not is_fake,
        is_fake=is_fake,
        disabled_reason=disabled_reason,
        outputs=outputs,
        version_warning=version_warning,
        has_non_deleted_outputs=has_non_deleted,
    )

Phase 2: Vue Component

2.1 API Client

File: client/src/api/workflowExtraction.ts (new file)

import { GalaxyApi } from "@/api";
import { rethrowSimple } from "@/utils/simple-error";

export interface ExtractionOutputDataset {
    id: string;
    hid: number;
    name: string;
    state: string;
    deleted: boolean;
    history_content_type: "dataset" | "dataset_collection";
}

export interface ExtractionJob {
    id: string;
    tool_name: string;
    tool_id: string | null;
    tool_version: string | null;
    is_workflow_compatible: boolean;
    is_fake: boolean;
    disabled_reason: string | null;
    outputs: ExtractionOutputDataset[];
    version_warning: string | null;
    has_non_deleted_outputs: boolean;
}

export interface ExtractionSummary {
    history_id: string;
    history_name: string;
    jobs: ExtractionJob[];
    warnings: string[];
    default_workflow_name: string;
}

export async function getExtractionSummary(historyId: string): Promise<ExtractionSummary> {
    const { data, error } = await GalaxyApi().GET("/api/histories/{history_id}/extraction_summary", {
        params: { path: { history_id: historyId } },
    });
    if (error) {
        rethrowSimple(error);
    }
    return data as ExtractionSummary;
}

export interface ExtractWorkflowPayload {
    from_history_id: string;
    workflow_name: string;
    job_ids: string[];
    dataset_ids: number[];
    dataset_collection_ids: number[];
    dataset_names?: string[];
    dataset_collection_names?: string[];
}

export async function extractWorkflow(payload: ExtractWorkflowPayload): Promise<{ id: string; url: string }> {
    const { data, error } = await GalaxyApi().POST("/api/workflows", {
        body: payload,
    });
    if (error) {
        rethrowSimple(error);
    }
    return data as { id: string; url: string };
}

2.2 Main Component

File: client/src/components/Workflow/WorkflowExtraction.vue (new file)

<script setup lang="ts">
import { faCheck, faExclamationTriangle, faSpinner } from "@fortawesome/free-solid-svg-icons";
import { FontAwesomeIcon } from "@fortawesome/vue-fontawesome";
import { BAlert, BFormCheckbox, BFormInput, BTable } from "bootstrap-vue";
import { computed, onMounted, ref } from "vue";
import { useRouter } from "vue-router/composables";

import {
    extractWorkflow,
    getExtractionSummary,
    type ExtractionJob,
    type ExtractionOutputDataset,
    type ExtractionSummary,
} from "@/api/workflowExtraction";
import { Toast } from "@/composables/toast";

import AsyncButton from "@/components/Common/AsyncButton.vue";
import Heading from "@/components/Common/Heading.vue";
import LoadingSpan from "@/components/LoadingSpan.vue";

interface Props {
    historyId: string;
}

const props = defineProps<Props>();
const router = useRouter();

// State
const loading = ref(true);
const loadError = ref<string | null>(null);
const summary = ref<ExtractionSummary | null>(null);
const workflowName = ref("");
const selectedJobs = ref<Set<string>>(new Set());
const inputDatasets = ref<Map<string, { hid: number; name: string; type: string }>>(new Map());

// Computed
const selectableJobs = computed(() => {
    if (!summary.value) return [];
    return summary.value.jobs.filter((j) => j.is_workflow_compatible && j.has_non_deleted_outputs);
});

const disabledJobs = computed(() => {
    if (!summary.value) return [];
    return summary.value.jobs.filter((j) => !j.is_workflow_compatible || j.is_fake);
});

const canSubmit = computed(() => {
    return workflowName.value.trim().length > 0 && (selectedJobs.value.size > 0 || inputDatasets.value.size > 0);
});

// Methods
async function loadSummary() {
    loading.value = true;
    loadError.value = null;
    try {
        summary.value = await getExtractionSummary(props.historyId);
        workflowName.value = summary.value.default_workflow_name;

        // Pre-select all workflow-compatible jobs with non-deleted outputs
        selectableJobs.value.forEach((job) => {
            selectedJobs.value.add(job.id);
        });
    } catch (e) {
        loadError.value = String(e);
    } finally {
        loading.value = false;
    }
}

function toggleJob(jobId: string) {
    if (selectedJobs.value.has(jobId)) {
        selectedJobs.value.delete(jobId);
    } else {
        selectedJobs.value.add(jobId);
    }
}

function toggleAllJobs(checked: boolean) {
    if (checked) {
        selectableJobs.value.forEach((job) => selectedJobs.value.add(job.id));
    } else {
        selectedJobs.value.clear();
    }
    // Also toggle input dataset selections
    if (checked) {
        disabledJobs.value.forEach((job) => {
            job.outputs.forEach((output) => {
                const key = `${output.history_content_type}_${output.hid}`;
                inputDatasets.value.set(key, {
                    hid: output.hid,
                    name: output.name,
                    type: output.history_content_type,
                });
            });
        });
    } else {
        inputDatasets.value.clear();
    }
}

function toggleInputDataset(output: ExtractionOutputDataset) {
    const key = `${output.history_content_type}_${output.hid}`;
    if (inputDatasets.value.has(key)) {
        inputDatasets.value.delete(key);
    } else {
        inputDatasets.value.set(key, {
            hid: output.hid,
            name: output.name,
            type: output.history_content_type,
        });
    }
}

function isInputSelected(output: ExtractionOutputDataset): boolean {
    return inputDatasets.value.has(`${output.history_content_type}_${output.hid}`);
}

async function onSubmit() {
    // Separate dataset and collection inputs
    const datasetIds: number[] = [];
    const datasetNames: string[] = [];
    const collectionIds: number[] = [];
    const collectionNames: string[] = [];

    inputDatasets.value.forEach((input) => {
        if (input.type === "dataset") {
            datasetIds.push(input.hid);
            datasetNames.push(input.name);
        } else {
            collectionIds.push(input.hid);
            collectionNames.push(input.name);
        }
    });

    try {
        const result = await extractWorkflow({
            from_history_id: props.historyId,
            workflow_name: workflowName.value,
            job_ids: Array.from(selectedJobs.value),
            dataset_ids: datasetIds,
            dataset_collection_ids: collectionIds,
            dataset_names: datasetNames,
            dataset_collection_names: collectionNames,
        });

        Toast.success(`Workflow "${workflowName.value}" created successfully!`);
        router.push(`/workflows/edit?id=${result.id}`);
    } catch (e) {
        Toast.error(`Failed to create workflow: ${e}`);
    }
}

onMounted(() => {
    loadSummary();
});
</script>

<template>
    <div class="workflow-extraction">
        <Heading h1 separator size="lg">Extract Workflow from History</Heading>

        <LoadingSpan v-if="loading" message="Loading history contents..." />

        <BAlert v-else-if="loadError" variant="danger" show>
            Failed to load history: {{ loadError }}
        </BAlert>

        <template v-else-if="summary">
            <p>
                The following list contains each tool that was run to create the datasets in your history.
                Please select those that you wish to include in the workflow.
            </p>
            <p class="text-muted">
                Tools which cannot be run interactively and thus cannot be incorporated into a workflow
                will be shown in gray.
            </p>

            <!-- Warnings -->
            <BAlert v-for="(warning, idx) in summary.warnings" :key="idx" variant="warning" show>
                {{ warning }}
            </BAlert>

            <!-- Workflow Name -->
            <div class="form-group mb-3">
                <label for="workflow-name" class="font-weight-bold">Workflow name</label>
                <BFormInput
                    id="workflow-name"
                    v-model="workflowName"
                    type="text"
                    placeholder="Enter workflow name"
                />
            </div>

            <!-- Action Buttons -->
            <div class="mb-3">
                <AsyncButton
                    color="blue"
                    :icon="faCheck"
                    title="Create Workflow"
                    :disabled="!canSubmit"
                    :action="onSubmit">
                    Create Workflow
                </AsyncButton>
                <button class="btn btn-secondary ml-2" @click="toggleAllJobs(true)">Check all</button>
                <button class="btn btn-secondary ml-2" @click="toggleAllJobs(false)">Uncheck all</button>
            </div>

            <!-- Jobs Table -->
            <table class="table table-bordered extraction-table">
                <thead>
                    <tr>
                        <th style="width: 47.5%">Tool</th>
                        <th style="width: 5%"></th>
                        <th style="width: 47.5%">History items created</th>
                    </tr>
                </thead>
                <tbody>
                    <tr v-for="job in summary.jobs" :key="job.id" :class="{ disabled: !job.is_workflow_compatible }">
                        <td>
                            <div class="tool-form" :class="{ 'tool-form-disabled': !job.is_workflow_compatible }">
                                <div class="tool-form-title">{{ job.tool_name }}</div>
                                <div class="tool-form-body">
                                    <template v-if="!job.is_workflow_compatible">
                                        <p class="text-muted font-italic">
                                            {{ job.disabled_reason || "This tool cannot be used in workflows" }}
                                        </p>
                                    </template>
                                    <template v-else>
                                        <BFormCheckbox
                                            :checked="selectedJobs.has(job.id)"
                                            @change="toggleJob(job.id)">
                                            Include "{{ job.tool_name }}" in workflow
                                        </BFormCheckbox>
                                        <BAlert v-if="!job.has_non_deleted_outputs" variant="info" show class="mt-2 mb-0">
                                            All job outputs have been deleted
                                        </BAlert>
                                        <BAlert v-if="job.version_warning" variant="warning" show class="mt-2 mb-0">
                                            {{ job.version_warning }}
                                        </BAlert>
                                    </template>
                                </div>
                            </div>
                        </td>
                        <td class="text-center align-middle">&#x25B6;</td>
                        <td>
                            <div v-for="output in job.outputs" :key="output.id" class="output-item">
                                <div
                                    class="dataset-item"
                                    :class="[`state-${output.state}`, { deleted: output.deleted }]">
                                    <span class="hid">{{ output.hid }}</span>
                                    <span class="name">{{ output.name }}</span>
                                </div>
                                <template v-if="!job.is_workflow_compatible">
                                    <BFormCheckbox
                                        :checked="isInputSelected(output)"
                                        class="ml-2"
                                        @change="toggleInputDataset(output)">
                                        Treat as input {{ output.history_content_type === 'dataset_collection' ? 'collection' : 'dataset' }}
                                    </BFormCheckbox>
                                    <BFormInput
                                        v-if="isInputSelected(output)"
                                        v-model="inputDatasets.get(`${output.history_content_type}_${output.hid}`).name"
                                        size="sm"
                                        class="ml-4 mt-1"
                                        style="max-width: 300px"
                                    />
                                </template>
                            </div>
                        </td>
                    </tr>
                </tbody>
            </table>
        </template>
    </div>
</template>

<style scoped lang="scss">
.workflow-extraction {
    max-width: 1200px;
    margin: 0 auto;
    padding: 1rem;
}

.extraction-table {
    .disabled {
        opacity: 0.7;
    }
}

.tool-form {
    border: 1px solid #ddd;
    border-radius: 4px;
    margin: 0.5rem 0;

    &.tool-form-disabled {
        background-color: #f8f9fa;

        .tool-form-title {
            color: #6c757d;
        }
    }
}

.tool-form-title {
    background-color: #e9ecef;
    padding: 0.5rem 1rem;
    font-weight: bold;
    border-bottom: 1px solid #ddd;
}

.tool-form-body {
    padding: 0.5rem 1rem;
}

.output-item {
    margin: 0.5rem 0;
}

.dataset-item {
    display: inline-block;
    padding: 0.25rem 0.5rem;
    border-radius: 4px;
    background-color: #f0f0f0;

    &.deleted {
        opacity: 0.5;
        text-decoration: line-through;
    }

    &.state-ok {
        border-left: 3px solid #28a745;
    }

    &.state-error {
        border-left: 3px solid #dc3545;
    }

    &.state-running, &.state-queued {
        border-left: 3px solid #ffc107;
    }

    .hid {
        font-weight: bold;
        margin-right: 0.25rem;
    }
}
</style>

2.3 Route Registration

File: client/src/entry/analysis/router.js

Add import and route:

// Add import at top
import WorkflowExtraction from "@/components/Workflow/WorkflowExtraction.vue";

// Add route in the children array of Analysis routes (around line 880)
{
    path: "workflows/extract",
    component: WorkflowExtraction,
    redirect: redirectAnon(),
    props: (route) => ({
        historyId: route.query.history_id,
    }),
},

Phase 3: Integration

3.1 Update HistoryOptions.vue

File: client/src/components/History/HistoryOptions.vue

Replace the iframeRedirect call with Vue router navigation:

<!-- Before (lines 210-217) -->
<BDropdownItem
    v-if="historyStore.currentHistoryId === history.id"
    :disabled="isAnonymous"
    :title="userTitle('Convert History to Workflow')"
    @click="iframeRedirect(`/workflow/build_from_current_history?history_id=${history.id}`)">
    <FontAwesomeIcon fixed-width :icon="faFileExport" />
    <span v-localize>Extract Workflow</span>
</BDropdownItem>

<!-- After -->
<BDropdownItem
    v-if="historyStore.currentHistoryId === history.id"
    :disabled="isAnonymous"
    :title="userTitle('Convert History to Workflow')"
    :to="`/workflows/extract?history_id=${history.id}`">
    <FontAwesomeIcon fixed-width :icon="faFileExport" />
    <span v-localize>Extract Workflow</span>
</BDropdownItem>

Remove iframeRedirect import if no longer used elsewhere.

Phase 4: Cleanup

4.1 Remove Legacy Code

Delete Mako template: templates/build_from_current_history.mako
Remove controller method from lib/galaxy/webapps/galaxy/controllers/workflow.py:
- Delete build_from_current_history() method (lines 182-230)
Update imports if summarize is no longer imported in controller

File Changes Summary

New Files

File	Description
`lib/galaxy/schema/workflow_extraction.py`	Pydantic models for extraction API
`client/src/api/workflowExtraction.ts`	TypeScript API client
`client/src/components/Workflow/WorkflowExtraction.vue`	Main Vue component

Modified Files

File	Changes
`lib/galaxy/webapps/galaxy/api/histories.py`	Add `extraction_summary` endpoint
`lib/galaxy/webapps/galaxy/services/histories.py`	Add `get_extraction_summary` method
`client/src/entry/analysis/router.js`	Add route for WorkflowExtraction
`client/src/components/History/HistoryOptions.vue`	Replace iframeRedirect with router link

Deleted Files

File	Reason
`templates/build_from_current_history.mako`	Replaced by Vue component

Controller Cleanup

File	Changes
`lib/galaxy/webapps/galaxy/controllers/workflow.py`	Remove `build_from_current_history` method

Test Plan

Backend Tests

File: lib/galaxy_test/api/test_workflow_extraction.py (add tests)

Review Notes:

Use ApiTestCase as base class (Galaxy convention for API tests)

Original plan had only 4 tests; expanded to ~15 for proper coverage

Tests grouped by: basic functionality, edge cases, authorization, error handling

from galaxy_test.base.api import ApiTestCase
from galaxy_test.base.populators import DatasetPopulator


class TestWorkflowExtractionSummaryAPI(ApiTestCase):
    """Tests for GET /api/histories/{history_id}/extraction_summary endpoint."""

    dataset_populator: DatasetPopulator

    def setUp(self):
        super().setUp()
        self.dataset_populator = DatasetPopulator(self.galaxy_interactor)

    # === Basic Functionality Tests ===

    def test_extraction_summary_empty_history(self):
        """Empty history returns empty jobs list."""
        history_id = self.dataset_populator.new_history()
        response = self._get(f"histories/{history_id}/extraction_summary")
        self._assert_status_code_is(response, 200)
        summary = response.json()
        assert summary["jobs"] == []
        assert summary["history_id"] == history_id
        assert "default_workflow_name" in summary

    def test_extraction_summary_with_uploaded_dataset(self):
        """Uploaded dataset appears as fake job."""
        history_id = self.dataset_populator.new_history()
        self.dataset_populator.new_dataset(history_id, wait=True)

        response = self._get(f"histories/{history_id}/extraction_summary")
        self._assert_status_code_is(response, 200)
        summary = response.json()

        # Should have exactly one fake job for the upload
        assert len(summary["jobs"]) == 1
        job = summary["jobs"][0]
        assert job["is_fake"] is True
        assert job["is_workflow_compatible"] is False
        assert len(job["outputs"]) == 1
        assert job["outputs"][0]["hid"] == 1

    def test_extraction_summary_with_tool_run(self):
        """Tool run appears as workflow-compatible job."""
        history_id = self.dataset_populator.new_history()
        hda = self.dataset_populator.new_dataset(history_id, wait=True)

        # Run cat tool
        self.dataset_populator.run_tool(
            tool_id="cat1",
            inputs={"input1": {"src": "hda", "id": hda["id"]}},
            history_id=history_id,
        )
        self.dataset_populator.wait_for_history(history_id)

        response = self._get(f"histories/{history_id}/extraction_summary")
        self._assert_status_code_is(response, 200)
        summary = response.json()

        # Should have 2 jobs: fake (upload) + cat
        assert len(summary["jobs"]) == 2

        cat_jobs = [j for j in summary["jobs"] if j["tool_id"] and "cat" in j["tool_id"]]
        assert len(cat_jobs) == 1
        assert cat_jobs[0]["is_workflow_compatible"] is True
        assert cat_jobs[0]["is_fake"] is False

    def test_extraction_summary_job_outputs(self):
        """Job outputs include correct metadata."""
        history_id = self.dataset_populator.new_history()
        hda = self.dataset_populator.new_dataset(history_id, content="test", wait=True)

        response = self._get(f"histories/{history_id}/extraction_summary")
        summary = response.json()

        output = summary["jobs"][0]["outputs"][0]
        assert "id" in output
        assert "hid" in output
        assert "name" in output
        assert "state" in output
        assert "deleted" in output
        assert output["history_content_type"] == "dataset"

    # === Edge Cases ===

    def test_extraction_summary_deleted_dataset(self):
        """Deleted dataset still appears but marked as deleted."""
        history_id = self.dataset_populator.new_history()
        hda = self.dataset_populator.new_dataset(history_id, wait=True)

        # Delete the dataset
        self._delete(f"histories/{history_id}/contents/{hda['id']}")

        response = self._get(f"histories/{history_id}/extraction_summary")
        summary = response.json()

        assert len(summary["jobs"]) == 1
        assert summary["jobs"][0]["outputs"][0]["deleted"] is True
        assert summary["jobs"][0]["has_non_deleted_outputs"] is False

    def test_extraction_summary_with_collection(self):
        """Dataset collection appears in extraction summary."""
        history_id = self.dataset_populator.new_history()
        hdca = self.dataset_populator.new_collection(
            history_id,
            collection_type="list",
            element_identifiers=[{"name": "elem1", "src": "new_collection", "collection_type": ""}],
        )
        self.dataset_populator.wait_for_history(history_id)

        response = self._get(f"histories/{history_id}/extraction_summary")
        summary = response.json()

        # Should have job(s) for the collection
        collection_outputs = [
            o for j in summary["jobs"] for o in j["outputs"]
            if o["history_content_type"] == "dataset_collection"
        ]
        assert len(collection_outputs) >= 1

    def test_extraction_summary_tool_version_warning(self):
        """Version mismatch produces warning."""
        # This test would need a tool with version changes
        # Placeholder for when such a test tool is available
        pass

    # === Authorization Tests ===

    def test_extraction_summary_owner_access(self):
        """History owner can access extraction summary."""
        history_id = self.dataset_populator.new_history()
        response = self._get(f"histories/{history_id}/extraction_summary")
        self._assert_status_code_is(response, 200)

    def test_extraction_summary_nonexistent_history(self):
        """Nonexistent history returns 404."""
        response = self._get("histories/nonexistent123/extraction_summary")
        self._assert_status_code_is(response, 404)

    def test_extraction_summary_inaccessible_history(self):
        """Inaccessible history returns 403."""
        # Create history as one user, try to access as another
        history_id = self.dataset_populator.new_history()

        # Create a different user and try to access
        with self._different_user():
            response = self._get(f"histories/{history_id}/extraction_summary")
            self._assert_status_code_is(response, 403)

    # === Response Format Tests ===

    def test_extraction_summary_response_schema(self):
        """Response matches expected schema."""
        history_id = self.dataset_populator.new_history()
        self.dataset_populator.new_dataset(history_id, wait=True)

        response = self._get(f"histories/{history_id}/extraction_summary")
        summary = response.json()

        # Top-level fields
        assert "history_id" in summary
        assert "history_name" in summary
        assert "jobs" in summary
        assert "warnings" in summary
        assert "default_workflow_name" in summary

        # Job fields
        job = summary["jobs"][0]
        required_job_fields = [
            "id", "tool_name", "tool_id", "tool_version",
            "is_workflow_compatible", "is_fake", "disabled_reason",
            "outputs", "version_warning", "has_non_deleted_outputs"
        ]
        for field in required_job_fields:
            assert field in job, f"Missing field: {field}"

    def test_extraction_summary_default_workflow_name(self):
        """Default workflow name includes history name."""
        history_id = self.dataset_populator.new_history(name="My Test History")

        response = self._get(f"histories/{history_id}/extraction_summary")
        summary = response.json()

        assert "My Test History" in summary["default_workflow_name"]

Frontend Tests

File: client/src/components/Workflow/WorkflowExtraction.test.ts (new file)

Review Note: Frontend tests use Vitest (not Jest). Update imports and mocking patterns accordingly.

import { mount } from "@vue/test-utils";
import { createTestingPinia } from "@pinia/testing";
import { BFormCheckbox, BFormInput } from "bootstrap-vue";
import flushPromises from "flush-promises";
import { describe, it, expect, beforeEach, vi } from "vitest";

import WorkflowExtraction from "./WorkflowExtraction.vue";

// Mock the API (Vitest pattern)
vi.mock("@/api/workflowExtraction", () => ({
    getExtractionSummary: vi.fn(),
    extractWorkflow: vi.fn(),
}));

import { getExtractionSummary, extractWorkflow } from "@/api/workflowExtraction";
import type { Mock } from "vitest";

const mockSummary = {
    history_id: "abc123",
    history_name: "Test History",
    jobs: [
        {
            id: "job1",
            tool_name: "Concatenate",
            tool_id: "cat1",
            tool_version: "1.0",
            is_workflow_compatible: true,
            is_fake: false,
            disabled_reason: null,
            outputs: [
                { id: "out1", hid: 2, name: "Output 1", state: "ok", deleted: false, history_content_type: "dataset" }
            ],
            version_warning: null,
            has_non_deleted_outputs: true,
        },
        {
            id: "fake_1",
            tool_name: "Import from History",
            tool_id: null,
            tool_version: null,
            is_workflow_compatible: false,
            is_fake: true,
            disabled_reason: "Input dataset",
            outputs: [
                { id: "in1", hid: 1, name: "Input 1", state: "ok", deleted: false, history_content_type: "dataset" }
            ],
            version_warning: null,
            has_non_deleted_outputs: true,
        }
    ],
    warnings: [],
    default_workflow_name: "Workflow constructed from history 'Test History'",
};

describe("WorkflowExtraction", () => {
    beforeEach(() => {
        vi.clearAllMocks();
        (getExtractionSummary as Mock).mockResolvedValue(mockSummary);
    });

    it("loads and displays extraction summary", async () => {
        const wrapper = mount(WorkflowExtraction, {
            propsData: { historyId: "abc123" },
            global: {
                plugins: [createTestingPinia()],
            },
        });

        await flushPromises();

        expect(getExtractionSummary).toHaveBeenCalledWith("abc123");
        expect(wrapper.text()).toContain("Concatenate");
        expect(wrapper.text()).toContain("Import from History");
    });

    it("pre-selects workflow-compatible jobs", async () => {
        const wrapper = mount(WorkflowExtraction, {
            propsData: { historyId: "abc123" },
            global: {
                plugins: [createTestingPinia()],
            },
        });

        await flushPromises();

        const checkboxes = wrapper.findAllComponents(BFormCheckbox);
        // The workflow-compatible job should be checked
        const compatibleCheckbox = checkboxes.find(c => c.text().includes("Concatenate"));
        expect(compatibleCheckbox?.props("checked")).toBe(true);
    });

    it("submits workflow extraction", async () => {
        (extractWorkflow as Mock).mockResolvedValue({ id: "wf123", url: "/workflows/wf123" });

        const wrapper = mount(WorkflowExtraction, {
            propsData: { historyId: "abc123" },
            global: {
                plugins: [createTestingPinia()],
            },
        });

        await flushPromises();

        // Click create button
        const createBtn = wrapper.find('[title="Create Workflow"]');
        await createBtn.trigger("click");
        await flushPromises();

        expect(extractWorkflow).toHaveBeenCalledWith(expect.objectContaining({
            from_history_id: "abc123",
            job_ids: ["job1"],
        }));
    });
});

Integration Tests

File: lib/galaxy_test/selenium/test_workflow_extraction.py (new file)

from .framework import SeleniumTestCase


class TestWorkflowExtractionVue(SeleniumTestCase):
    """Selenium tests for the Vue workflow extraction UI."""

    ensure_registered = True

    def test_extraction_ui_loads(self):
        """Test that the extraction UI loads for a history with content."""
        self.perform_upload(self.get_filename("1.fasta"))
        self.history_panel_wait_for_hid_ok(1)

        # Navigate to extraction
        self.navigate_to_extraction()

        # Should see the extraction form
        self.wait_for_selector(".workflow-extraction")
        self.assert_selector_present("input#workflow-name")

    def test_extraction_creates_workflow(self):
        """Test that workflow extraction creates and opens workflow."""
        self.perform_upload(self.get_filename("1.fasta"))
        self.history_panel_wait_for_hid_ok(1)

        # Run a tool
        self.tool_open("cat1")
        self.tool_set_value("input1", "1.fasta")
        self.tool_form_execute()
        self.history_panel_wait_for_hid_ok(2)

        # Navigate to extraction
        self.navigate_to_extraction()

        # Should have the cat job selected
        self.wait_for_selector("input[type='checkbox']:checked")

        # Create workflow
        self.click_button("Create Workflow")

        # Should navigate to editor
        self.wait_for_selector(".workflow-editor", timeout=30)

    def navigate_to_extraction(self):
        """Navigate to the workflow extraction page."""
        history_id = self.current_history_id()
        self.navigate_to(f"/workflows/extract?history_id={history_id}")

Migration Steps

Phase 1: Backend API
- Create schema file with Pydantic models
- Add endpoint to histories API
- Add service method
- Write and run backend tests
Phase 2: Vue Component
- Create API client
- Create Vue component
- Add route
- Write and run frontend tests
Phase 3: Integration
- Update HistoryOptions.vue to use Vue routing
- Test end-to-end flow
- Run Selenium tests
Phase 4: Cleanup
- Remove Mako template
- Remove controller method
- Update any remaining references
- Final testing pass

Unresolved Questions

Large history performance: Should API paginate jobs? Current Mako loads all at once. Consider lazy loading for histories with 100+ jobs.
- Review suggestion: Add limit and offset query params; default to unpaginated for backwards compat.
Input naming UX: Current Mako has inline text inputs for input names. Should we use a modal instead for cleaner UX?
Accessibility: Current Mako has known accessibility issues (#17194). Vue component should use proper ARIA labels.
- Review suggestion: Add aria-label to all checkboxes, use semantic HTML for table.
Error states: How to handle partial failures (some jobs extractable, some not)?
- Review suggestion: Continue showing all jobs; disable rows with errors; show inline error messages.
Collection type display: Should we show collection types (list, paired, etc.) in the UI?
URL back-compat: Should /workflow/build_from_current_history redirect to /workflows/extract?
- Review suggestion: Yes, add redirect in legacy controller before removal.
FakeJob type imports: Need to verify FakeJob and DatasetCollectionCreationJob are exported from galaxy.workflow.extract for isinstance() checks.

Review Status

Review Type	Status	Notes
Python Code Structure	✅ Reviewed	Type annotations added, helper method extracted
FastAPI Conventions	✅ Reviewed	CBV pattern documented, error responses added
Business Logic Organization	✅ Reviewed	Service layer properly separated
Dependency Injection	✅ Reviewed	Uses existing manager pattern
Test Coverage	✅ Reviewed	Expanded from 4 to 15 tests, fixed base class
Frontend (Vitest)	✅ Reviewed	Updated to Vitest patterns

Last Updated: Code review findings incorporated into plan.

Raw

WORKFLOW_EXTRACTION_VUE_CONVERSION_PLAN_PERFORMANCE.md

Workflow Extraction API Performance Analysis

Executive Summary

The proposed workflow extraction summary API introduces performance concerns primarily around database query patterns for large histories. The core extraction logic in lib/galaxy/workflow/extract.py iterates through all visible history contents and loads related job associations via SQLAlchemy relationships. For typical histories (10-100 items), this presents no issues. However, histories with 500+ items could experience latency due to N+1 query patterns when traversing copied_from_history_dataset_association chains and lazy-loaded relationships.

The current Mako implementation (build_from_current_history) suffers from identical performance characteristics since it uses the same summarize() function. The migration to Vue/FastAPI does not inherently worsen performance, but it does present an opportunity to address these issues. Recommendation: Skip pagination for MVP, but implement eager loading optimizations and add performance instrumentation for monitoring.

Current Implementation Analysis

Mako Controller: `build_from_current_history()`

File: lib/galaxy/webapps/galaxy/controllers/workflow.py (lines 182-230)

if (job_ids is None and dataset_ids is None) or workflow_name is None:
    jobs, warnings = summarize(trans, history)
    return trans.fill_template("build_from_current_history.mako", jobs=jobs, warnings=warnings, history=history)

Key observations:

Loads ALL visible history contents synchronously in a single request
No pagination, limits, or streaming
All processing happens server-side before rendering
Template rendered after all data is loaded into memory

WorkflowSummary Class

File: lib/galaxy/workflow/extract.py (lines 243-389)

The WorkflowSummary class performs the heavy lifting:

def __summarize(self):
    for content in self.history.visible_contents:  # 1 query (2 actually: HDAs + HDCAs)
        self.__summarize_content(content)

For each content item:

HDAs: Accesses creating_job_associations (lazy load per item)
HDCAs: Accesses creating_job_associations and potentially collection.first_dataset_element
Both: May traverse copied_from_* chains with additional queries

Database Query Analysis

Primary Query Pattern

Query	Description	Trigger	Estimated Count
`SELECT FROM history_dataset_association`	Load visible HDAs	`visible_contents` property	1
`SELECT FROM history_dataset_collection_association`	Load visible HDCAs	`visible_contents` property	1
`SELECT FROM job_to_output_dataset_association`	Get creating job	`creating_job_associations` access	N (per HDA)
`SELECT FROM job_to_output_dataset_collection_association`	Get creating job for HDCA	`creating_job_associations` access	M (per HDCA)
`SELECT FROM job`	Load job details	Accessing `job` relationship	N+M (per association)
`SELECT FROM history_dataset_association`	Traverse copy chain	`copied_from_history_dataset_association`	Variable

Relationship Loading Strategy

From lib/galaxy/model/__init__.py:

# JobToOutputDatasetAssociation
dataset: Mapped["HistoryDatasetAssociation"] = relationship(
    lazy="joined", back_populates="creating_job_associations"
)
job: Mapped["Job"] = relationship(back_populates="output_datasets")

# HDA's creating_job_associations - DEFAULT LAZY (not explicitly specified)
creating_job_associations=relationship(JobToOutputDatasetAssociation, back_populates="dataset")

Critical Finding: The creating_job_associations relationship on HDA uses default lazy loading (SELECT per access), causing N+1 queries for histories with many items.

Copy Chain Traversal

The __original_hda() method is particularly problematic:

def __original_hda(self, hda):
    while hda.copied_from_history_dataset_association:  # Each iteration = 1 query
        hda = hda.copied_from_history_dataset_association
    return hda

For datasets copied multiple times (e.g., through history imports), this creates a chain of database queries.

Performance Risks

Risk	Severity	Likelihood	Description	Mitigation
N+1 queries on `creating_job_associations`	High	High	Every HDA/HDCA access triggers separate query	Eager load via `selectinload()`
Copy chain traversal	Medium	Medium	Recursive queries for copied datasets	Cache or batch load copy chains
Memory for large histories	Medium	Low	1000+ items held in memory simultaneously	Stream results or paginate
Toolbox lookups	Low	High	`trans.app.toolbox.get_tool()` for each job	Minimal - toolbox is cached
Response serialization	Low	Low	JSON serialization of large response	Only for extreme cases (10k+ items)

Query Count Estimates

History Size	Estimated Queries (worst case)	Estimated Time
10 items	~25 queries	<100ms
50 items	~120 queries	~200-500ms
100 items	~250 queries	~500ms-1s
500 items	~1200 queries	~3-5s
1000 items	~2500 queries	~8-15s
5000 items	~12500 queries	30s+ (timeout likely)

Note: Estimates assume typical SQLite/PostgreSQL latency. Actual performance varies by hardware and connection pooling.

Pagination Assessment

Current Galaxy Patterns

Galaxy's history contents API (/api/histories/{id}/contents) uses FilterQueryParams with limit and offset:

class FilterQueryParams(ValueFilterQueryParams, PaginationQueryParams):
    # Inherits from PaginationQueryParams:
    offset: Optional[int] = Field(default=0, ge=0)
    limit: Optional[int] = Field(default=None, ge=1)

The v2 index endpoint properly supports pagination:

contents = self.history_contents_manager.contents(
    history,
    filters=filters,
    limit=filter_query_params.limit,
    offset=filter_query_params.offset,
    ...
)

Workflow Extraction Pagination Challenges

Unlike simple content listing, workflow extraction has structural dependencies:

Job-Dataset Relationships: Jobs and their outputs must be returned together
Copy Chain Context: Pagination boundaries could split related copy chains
UI Requirements: Users need to see all selectable items to make informed choices
Selection State: Paginated selection requires client-side state management

Pagination Complexity

Approach	Complexity	UX Impact	Implementation Effort
No pagination (current)	None	Full list visible	Done
Simple offset/limit	Low	May split job groups	Low
Job-based pagination	Medium	Preserves job integrity	Medium
Virtual scrolling (client)	Low	All data loaded, UI virtualized	Medium (frontend)

Recommendations

1. Immediate (MVP Release)

Skip pagination for MVP. Rationale:

Current Mako implementation has identical performance characteristics
Real-world histories rarely exceed 500 visible items
Adds complexity to both backend and frontend
Can be added incrementally if needed

Instead, implement these optimizations:

a) Eager load job associations in the query:

from sqlalchemy.orm import selectinload

stmt = select(HistoryDatasetAssociation).where(...).options(
    selectinload(HistoryDatasetAssociation.creating_job_associations)
    .selectinload(JobToOutputDatasetAssociation.job)
)

b) Batch load copy chains if many copied items are detected

c) Add response time monitoring to track actual performance in production

2. Short-term (Post-MVP)

Add max_items parameter with default (e.g., 500) and warning if exceeded
Implement client-side virtual scrolling for large result sets
Consider background loading with polling for very large histories

3. Long-term

Full pagination support if monitoring indicates need
Caching layer for frequently accessed histories
Database query optimization (indexes, materialized views)

Conclusion: Skip Pagination for MVP?

Yes - Skip pagination for MVP.

Justification:

Parity with existing implementation: The Mako version has no pagination and has operated acceptably for years
Typical usage patterns: Most workflows are extracted from histories with <100 items
Complexity vs. benefit tradeoff: Pagination adds significant complexity to both backend (maintaining job grouping) and frontend (selection state management)
Easy eager-loading wins: Simple SQLAlchemy optimizations can reduce query count by 80-90% without pagination
Incremental approach: Pagination can be added later with monitoring data to justify the investment

Recommended follow-up actions:

Add eager loading for creating_job_associations in WorkflowSummary
Add request timing metrics to new API endpoint
Document known limitation for histories with 1000+ items
Create follow-up issue for pagination if monitoring shows need

Memory Concern Mitigation

Investigation Summary

A concern was raised: does the Mako implementation use any streaming or lazy-loading patterns that the JSON API cannot replicate? The answer is NO - both implementations hold all data in memory before sending to the client.

Evidence from Code Analysis

1. `WorkflowSummary` Eagerly Builds a Dict in Memory

File: lib/galaxy/workflow/extract.py (lines 243-258)

class WorkflowSummary:
    def __init__(self, trans, history):
        self.jobs = {}  # ← All jobs stored in dict
        # ...
        self.__summarize()  # Populates self.jobs fully before returning

The summarize() function returns (summary.jobs, summary.warnings) - a fully materialized dict and set. There is no generator or iterator returned; all processing completes before any data is returned.

2. `visible_contents` Uses Generators BUT for Iteration Only

File: lib/galaxy/model/__init__.py (lines 3962-3978)

@property
def visible_contents(self):
    return self.contents_iter(types=["dataset", "dataset_collection"], visible=True)

def contents_iter(self, **kwds):
    # Returns a generator via merge_sorted_iterables()
    return galaxy.util.merge_sorted_iterables(operator.attrgetter("hid"), *iters)

While visible_contents returns a generator (via merge_sorted_iterables), this only provides lazy iteration over SQLAlchemy query results. Crucially:

The WorkflowSummary.__summarize() method consumes this iterator completely in a for loop
Each item is processed and stored in self.jobs dict
By the time summarize() returns, ALL items are in memory

The generator pattern here saves memory during iteration (items aren't all loaded at once from DB), but the end result is still a fully-materialized jobs dict.

3. Mako Template Renders Synchronously, Not Streaming

File: lib/galaxy/webapps/base/webapp.py (lines 1087-1103)

def fill_template_mako(self, filename, template_lookup=None, **kwargs):
    template = template_lookup.get_template(filename)
    data = dict(...)
    data.update(kwargs)  # ← jobs dict passed here
    return template.render(**data)  # ← Full HTML string returned

Mako's template.render() returns a complete string. The entire HTML response is built in memory before being sent to the client. There is no chunked transfer or streaming.

4. Template Iterates Over Pre-Built Dict

File: templates/build_from_current_history.mako (line 126)

%for job, datasets in jobs.items():
    <%
    # ... process job and datasets
    %>
%endfor

The template receives the complete jobs dict and iterates over .items(). No lazy loading occurs at this point - all data is already in memory.

Memory Flow Comparison

Stage	Mako Implementation	JSON API Implementation
DB Query	SQLAlchemy lazy query via generator	Same (uses `summarize()`)
Processing	Iterates generator, builds `jobs` dict	Same (uses `summarize()`)
Before Response	Full `jobs` dict + HTML string in memory	Full `jobs` dict + JSON string in memory
Response	Return HTML string	Return JSON string

Conclusion

The migration to JSON API introduces no new memory concerns. Both implementations:

Use the same summarize() function that builds a complete jobs dict
Hold all job/dataset data in memory before generating the response
Generate a complete response string (HTML or JSON) synchronously
Send the complete response to the client (no streaming)

The only difference is the final serialization format (HTML vs JSON). If anything, JSON is typically smaller than the equivalent HTML with styling/structure, so the API may use slightly less memory for the response payload.

Risk Assessment: Memory usage is NOT a differentiator between Mako and API approaches. Any memory optimization (pagination, streaming) would need to be applied to the shared summarize() function to benefit either approach.