PandaWhoCodes · April 2, 2026 07:08
diff --git a/vertex-ai-ingestion-pipeline.html b/vertex-ai-ingestion-pipeline.html
 <!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Vertex AI RAG Ingestion Pipeline — Research & Plan</title>
    <link href="https://fonts.googleapis.com/css2?family=Crimson+Pro:ital,wght@0,300;0,400;0,500;0,600;1,300;1,400&family=Overpass+Mono:wght@300;400;500&family=Nunito+Sans:wght@300;400;600;700&display=swap" rel="stylesheet">
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        :root {
            --bg: #f7f4ef;
            --bg-card: #ffffff;
            --bg-code: #1e1e2e;
            --text: #2c2a26;
            --text-soft: #5c5850;
            --text-dim: #9c978d;
            --accent: #2563eb;
            --accent-dark: #1d4ed8;
            --accent-green: #16a34a;
            --accent-red: #dc2626;
            --accent-orange: #ea580c;
            --accent-purple: #7c3aed;
            --accent-dim: rgba(37, 99, 235, 0.08);
            --accent-green-dim: rgba(22, 163, 74, 0.08);
            --serif: 'Crimson Pro', Georgia, serif;
            --sans: 'Nunito Sans', -apple-system, sans-serif;
            --mono: 'Overpass Mono', monospace;
        }
        body {
            background: var(--bg);
            color: var(--text);
            font-family: var(--sans);
            font-size: 16px;
            line-height: 1.7;
            -webkit-font-smoothing: antialiased;
        }
        body::after {
            content: '';
            position: fixed;
            inset: 0;
            background-image: url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='noise'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.65' numOctaves='3' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23noise)' opacity='0.03'/%3E%3C/svg%3E");
            pointer-events: none;
            z-index: 9999;
        }
        .page { max-width: 720px; margin: 0 auto; padding: 3rem 1.5rem; }

        /* Header */
        .header { text-align: center; margin-bottom: 3rem; }
        .header .tag {
            font-family: var(--mono);
            font-size: 0.55rem;
            letter-spacing: 0.2em;
            text-transform: uppercase;
            color: var(--accent);
            margin-bottom: 1rem;
            display: block;
        }
        .header h1 {
            font-family: var(--serif);
            font-size: clamp(1.8rem, 5vw, 2.6rem);
            font-weight: 300;
            color: var(--text);
            line-height: 1.2;
            margin-bottom: 0.8rem;
        }
        .header .sub {
            font-size: 0.85rem;
            color: var(--text-dim);
            max-width: 500px;
            margin: 0 auto;
        }

        /* Ribbon */
        .ribbon { width: 2px; height: 50px; background: var(--accent); margin: 2.5rem auto; opacity: 0.2; position: relative; }
        .ribbon::after { content: ''; position: absolute; bottom: -6px; left: -3px; width: 8px; height: 8px; background: var(--accent); opacity: 0.4; border-radius: 50%; }
        .divider { text-align: center; color: var(--text-dim); font-size: 1rem; letter-spacing: 0.5em; margin: 2rem 0; }

        /* Section */
        .section { margin-bottom: 3rem; }
        .section h2 {
            font-family: var(--serif);
            font-size: 1.6rem;
            font-weight: 400;
            color: var(--text);
            margin-bottom: 0.3rem;
        }
        .section .note {
            font-size: 0.78rem;
            color: var(--text-dim);
            margin-bottom: 1.5rem;
        }
        .section h3 {
            font-family: var(--sans);
            font-size: 0.95rem;
            font-weight: 700;
            color: var(--text);
            margin: 1.5rem 0 0.5rem;
        }
        p { margin-bottom: 1rem; color: var(--text-soft); font-size: 0.9rem; }

        /* Architecture box */
        .arch-box {
            background: var(--bg-code);
            color: #cdd6f4;
            font-family: var(--mono);
            font-size: 0.65rem;
            line-height: 1.8;
            padding: 1.5rem;
            border-radius: 8px;
            overflow-x: auto;
            white-space: pre;
            margin: 1.5rem 0;
        }
        .arch-box .hl { color: #89b4fa; }
        .arch-box .hlg { color: #a6e3a1; }
        .arch-box .hlo { color: #fab387; }

        /* Cards */
        .card {
            background: var(--bg-card);
            border: 1px solid rgba(0,0,0,0.05);
            border-radius: 8px;
            padding: 1.2rem 1.4rem;
            margin-bottom: 1rem;
            box-shadow: 0 1px 8px rgba(0,0,0,0.03);
        }
        .card-title {
            font-family: var(--sans);
            font-weight: 700;
            font-size: 0.88rem;
            color: var(--text);
            margin-bottom: 0.3rem;
        }
        .card p { margin-bottom: 0.5rem; font-size: 0.82rem; }

        /* Phase indicator */
        .phase {
            display: inline-flex;
            align-items: center;
            gap: 0.4rem;
            font-family: var(--mono);
            font-size: 0.55rem;
            letter-spacing: 0.1em;
            text-transform: uppercase;
            padding: 0.2em 0.7em;
            border-radius: 3px;
            margin-bottom: 0.8rem;
        }
        .phase.blue { background: var(--accent-dim); color: var(--accent); }
        .phase.green { background: var(--accent-green-dim); color: var(--accent-green); }
        .phase.orange { background: rgba(234,88,12,0.08); color: var(--accent-orange); }
        .phase.purple { background: rgba(124,58,237,0.08); color: var(--accent-purple); }

        /* Code blocks */
        .code-block {
            background: var(--bg-code);
            color: #cdd6f4;
            font-family: var(--mono);
            font-size: 0.68rem;
            line-height: 1.7;
            padding: 1.2rem 1.4rem;
            border-radius: 6px;
            overflow-x: auto;
            margin: 1rem 0;
        }
        .code-label {
            font-family: var(--mono);
            font-size: 0.5rem;
            letter-spacing: 0.1em;
            text-transform: uppercase;
            color: var(--text-dim);
            margin-bottom: 0.3rem;
        }
        .code-block .kw { color: #cba6f7; }
        .code-block .str { color: #a6e3a1; }
        .code-block .cm { color: #6c7086; }
        .code-block .fn { color: #89b4fa; }

        /* Limits table */
        .limits-table {
            width: 100%;
            border-collapse: collapse;
            margin: 1rem 0;
            font-size: 0.82rem;
        }
        .limits-table th {
            font-family: var(--mono);
            font-size: 0.55rem;
            letter-spacing: 0.1em;
            text-transform: uppercase;
            color: var(--text-dim);
            text-align: left;
            padding: 0.5rem 0.8rem;
            border-bottom: 2px solid rgba(0,0,0,0.08);
        }
        .limits-table td {
            padding: 0.5rem 0.8rem;
            border-bottom: 1px solid rgba(0,0,0,0.04);
            color: var(--text-soft);
        }
        .limits-table td:last-child { font-family: var(--mono); font-size: 0.78rem; }

        /* Decision cards */
        .decision {
            background: var(--bg-card);
            border: 1px solid rgba(0,0,0,0.05);
            border-left: 3px solid var(--accent);
            border-radius: 0 6px 6px 0;
            padding: 1rem 1.2rem;
            margin-bottom: 0.8rem;
        }
        .decision .q {
            font-family: var(--sans);
            font-weight: 700;
            font-size: 0.85rem;
            color: var(--text);
            margin-bottom: 0.3rem;
        }
        .decision .a {
            font-size: 0.82rem;
            color: var(--text-soft);
            margin: 0;
        }
        .decision .verdict {
            font-family: var(--mono);
            font-size: 0.6rem;
            color: var(--accent-green);
            text-transform: uppercase;
            letter-spacing: 0.1em;
            margin-top: 0.3rem;
        }

        /* Timeline */
        .timeline { margin: 1.5rem 0; }
        .timeline-week {
            display: flex;
            gap: 1rem;
            margin-bottom: 1.2rem;
            align-items: flex-start;
        }
        .timeline-marker {
            width: 36px;
            height: 36px;
            border-radius: 50%;
            display: flex;
            align-items: center;
            justify-content: center;
            font-family: var(--mono);
            font-size: 0.6rem;
            font-weight: 700;
            color: white;
            flex-shrink: 0;
        }
        .timeline-marker.w1 { background: var(--accent); }
        .timeline-marker.w2 { background: var(--accent-green); }
        .timeline-marker.w3 { background: var(--accent-orange); }
        .timeline-marker.w4 { background: var(--accent-purple); }
        .timeline-content { flex: 1; }
        .timeline-content h4 {
            font-size: 0.88rem;
            font-weight: 700;
            margin-bottom: 0.3rem;
        }
        .timeline-content ul {
            list-style: none;
            padding: 0;
        }
        .timeline-content li {
            font-size: 0.8rem;
            color: var(--text-soft);
            padding: 0.15rem 0;
            padding-left: 1rem;
            position: relative;
        }
        .timeline-content li::before {
            content: '→';
            position: absolute;
            left: 0;
            color: var(--text-dim);
        }

        /* Industry patterns */
        .pattern {
            display: flex;
            gap: 0.8rem;
            align-items: flex-start;
            margin-bottom: 1rem;
        }
        .pattern-num {
            width: 28px;
            height: 28px;
            border-radius: 50%;
            background: var(--accent-dim);
            color: var(--accent);
            display: flex;
            align-items: center;
            justify-content: center;
            font-family: var(--mono);
            font-size: 0.6rem;
            font-weight: 700;
            flex-shrink: 0;
        }
        .pattern-text { flex: 1; }
        .pattern-text strong { font-size: 0.85rem; }
        .pattern-text p { font-size: 0.8rem; margin: 0.2rem 0 0; }

        /* Status badge */
        .badge {
            display: inline-block;
            font-family: var(--mono);
            font-size: 0.5rem;
            letter-spacing: 0.08em;
            text-transform: uppercase;
            padding: 0.15em 0.5em;
            border-radius: 3px;
            margin-right: 0.3rem;
        }
        .badge.done { background: rgba(22,163,74,0.1); color: var(--accent-green); }
        .badge.todo { background: rgba(37,99,235,0.1); color: var(--accent); }
        .badge.rec { background: rgba(234,88,12,0.1); color: var(--accent-orange); }

        /* Footer */
        .footer {
            margin-top: 3rem;
            padding-top: 1.5rem;
            border-top: 1px solid rgba(0,0,0,0.06);
            text-align: center;
        }
        .footer p { font-size: 0.7rem; color: var(--text-dim); margin-bottom: 0.2rem; }

        /* Scroll */
        .fade-in { opacity: 0; transform: translateY(14px); transition: opacity 0.6s ease, transform 0.6s ease; }
        .fade-in.visible { opacity: 1; transform: translateY(0); }
    </style>
 </head>
 <body>
 <div class="page">

    <div class="header">
        <span class="tag">Saama · Vertex AI · ADK · Research Notes</span>
        <h1>RAG Ingestion Pipeline</h1>
        <p class="sub">Research & plan for file upload → processing → Vertex AI RAG Engine. Based on Ashish & Abhijit's call, April 2, 2026.</p>
    </div>

    <div class="ribbon"></div>

    <!-- PROBLEM -->
    <div class="section fade-in">
        <h2>The Problem</h2>
        <p>Users upload files (up to <strong>1 GB</strong>, including zip archives) from a React frontend. These need to land in GCS without hitting the backend pod, get extracted/validated/split, maintain parent-child relationships in the UI, track status in the DB, and be ingested into Vertex AI RAG Engine with proper context and metadata.</p>
        <p>Abhijit tried Airflow GCS hooks before — they didn't work reliably. Need a better trigger mechanism.</p>
    </div>

    <!-- ARCHITECTURE -->
    <div class="section fade-in">
        <h2>Architecture</h2>
        <div class="arch-box"><span class="hl">React App</span> → Backend (<span class="hlg">signed URL</span>) → <span class="hlo">GCS /uploads/</span>
                                                          │
                                                    <span class="hlg">Eventarc trigger</span>
                                                          │
                                                    <span class="hl">Cloud Run processor</span>
                                                          │
                                                ┌─────────┴─────────┐
                                                │                   │
                                           <span class="hlo">Zip? Extract</span>      <span class="hlo">Single file</span>
                                                │                   │
                                                └─────────┬─────────┘
                                                          │
                                                    <span class="hlg">Validate + Split</span>
                                                    (PDF by page, DOCX by heading)
                                                          │
                                                    <span class="hlo">GCS /processed/</span>
                                                          │
                                                    <span class="hl">Update DB status</span>
                                                          │
                                                    <span class="hlg">ImportRagFiles API</span>
                                                    (+ Layout Parser)
                                                          │
                                                    <span class="hl">Vertex AI RAG Corpus</span></div>
    </div>

    <!-- VERTEX LIMITS -->
    <div class="section fade-in">
        <h2>Vertex AI RAG Limits</h2>
        <p class="note">These are the hard constraints your processing pipeline must respect</p>
        <table class="limits-table">
            <thead><tr><th>File Type</th><th>Max Size</th><th>Notes</th></tr></thead>
            <tbody>
                <tr><td>PDF</td><td>50 MB</td><td>500 pages max with Layout Parser</td></tr>
                <tr><td>DOCX</td><td>50 MB</td><td>Split by heading structure</td></tr>
                <tr><td>Text / Markdown</td><td>10 MB</td><td></td></tr>
                <tr><td>HTML / JSON</td><td>10 MB</td><td></td></tr>
                <tr><td>ZIP</td><td>❌ Not supported</td><td>Must extract first</td></tr>
                <tr><td>Chunking default</td><td>1024 tokens</td><td>256 token overlap</td></tr>
            </tbody>
        </table>
    </div>

    <div class="divider">· · ·</div>

    <!-- PHASE 1 -->
    <div class="section fade-in">
        <span class="phase blue">Phase 1 — Upload</span>
        <h2>Browser → GCS via Signed URLs</h2>
        <p>React requests a V4 signed URL from backend → uploads directly to GCS. <strong>Zero backend memory usage.</strong> Works for files up to 1GB+. For >5GB, use resumable upload protocol.</p>

        <div class="card">
            <div class="card-title">How it works</div>
            <p>1. React calls <code>/api/upload/signed-url</code> with filename & content type</p>
            <p>2. Backend generates V4 signed PUT URL (15 min expiry, scoped to user folder)</p>
            <p>3. React does <code>fetch(signed_url, { method: 'PUT', body: file })</code> — direct to GCS</p>
            <p>4. On completion, React notifies backend with the object path</p>
        </div>

        <div class="card">
            <div class="card-title">Don't forget</div>
            <p>• Set CORS on the GCS bucket to allow PUT from your frontend origin</p>
            <p>• For resumable uploads (large files): initiate via JSON API, then upload in chunks</p>
            <p>• Abhijit confirmed he's seen demos of this working — it's production-proven</p>
        </div>
    </div>

    <!-- PHASE 2 -->
    <div class="section fade-in">
        <span class="phase green">Phase 2 — Trigger</span>
        <h2>Eventarc, Not Airflow Polling</h2>

        <div class="decision">
            <div class="q">Option A: Eventarc + Cloud Run <span class="badge rec">Recommended</span></div>
            <p class="a">GCS <code>object.finalized</code> event → Cloud Run service. No polling, no missed files. Google's recommended pattern. This is the "TCP connector / GCS hook" you were asking about — but managed by Google.</p>
            <div class="verdict">→ Use this as primary trigger</div>
        </div>

        <div class="decision">
            <div class="q">Option B: Airflow / Cloud Composer</div>
            <p class="a"><code>GCSObjectExistenceSensor</code> or trigger DAG via Cloud Functions. Better for complex orchestration. But Abhijit already tried GCS hooks in Airflow and they weren't reliable.</p>
            <div class="verdict">→ Use only if you need complex DAG orchestration on top</div>
        </div>

        <div class="decision">
            <div class="q">Option C: Backend notification (fallback)</div>
            <p class="a">Frontend notifies backend after upload → backend triggers processing. Simple but what if frontend crashes mid-upload?</p>
            <div class="verdict">→ Use as belt-and-suspenders alongside Eventarc</div>
        </div>
    </div>

    <!-- PHASE 3 -->
    <div class="section fade-in">
        <span class="phase orange">Phase 3 — Processing</span>
        <h2>Extract, Validate, Split</h2>
        <p>Cloud Run service handles all post-upload processing. Cloud Run supports up to 32GB memory — enough for 1GB zips.</p>

        <h3>Step 1: Zip Extraction</h3>
        <p>Download zip from GCS → extract to /tmp → filter supported types → upload individual files to <code>/processed/{user_id}/{parent_file_id}/</code></p>

        <h3>Step 2: Validation</h3>
        <p>Check file types, reject unsupported formats, scan for corruption. Update DB status to "failed" with reason if rejected.</p>

        <h3>Step 3: Split Large Files</h3>
        <div class="card">
            <div class="card-title">PDF splitting</div>
            <p>Split by pages (max 100 pages per chunk). Use <code>pypdf</code>. Preserve metadata: original title, page range, total pages.</p>
        </div>
        <div class="card">
            <div class="card-title">DOCX splitting — the hard one ⚡</div>
            <p>Abhijit's concern is valid: arbitrary byte splitting loses context. <strong>Split by heading structure</strong> (Heading 1 boundaries), not arbitrary cuts. Add a context preamble to each sub-doc: document title + section headers covered.</p>
        </div>

        <h3>Step 4: Context Preservation (THE KEY)</h3>
        <div class="card">
            <div class="card-title">Document AI Layout Parser <span class="badge rec">Google's solution</span></div>
            <p>Integrates directly with RAG Engine. Understands headings, tables, lists, sections. Creates context-aware chunks that respect layout. Just pass <code>layout_parser</code> config during import — one line. Max 20MB/500 pages per PDF.</p>
        </div>
        <div class="card">
            <div class="card-title">Metadata-enriched chunks (custom splitting)</div>
            <p>When splitting a 200-page PDF into 2×100-page chunks: add context preamble as metadata (doc title, section headings, page range). Store original document ID for traceability. RAG Engine's <code>chunk_overlap=256</code> tokens bridges boundaries.</p>
        </div>

        <h3>Step 5: Parent-Child Tracking</h3>
        <div class="card">
            <div class="card-title">DB Model (Abhijit already committed table changes)</div>
            <p><strong>files table:</strong> id, user_id, original_filename, status, gcs_path, size, type, created_at</p>
            <p><strong>file_chunks table:</strong> id, parent_file_id, chunk_index, gcs_path, page_range, size, status</p>
            <p>User sees parent file in UI → system sends all child chunks to Vertex AI for retrieval</p>
        </div>
    </div>

    <!-- PHASE 4 -->
    <div class="section fade-in">
        <span class="phase purple">Phase 4 — Ingest</span>
        <h2>Vertex AI RAG Import</h2>
        <p>Call <code>rag.import_files()</code> with Layout Parser config. Built-in deduplication handles re-uploads. Log results to BigQuery for debugging.</p>

        <div class="card">
            <div class="card-title">Key config</div>
            <p><code>chunk_size=1024</code> tokens, <code>chunk_overlap=256</code> tokens</p>
            <p>Layout Parser processor for PDFs with tables/charts</p>
            <p><code>import_result_sink</code> → BigQuery table for failure debugging</p>
            <p><code>max_embedding_requests_per_min=900</code> (rate limiting)</p>
        </div>
    </div>

    <div class="ribbon"></div>

    <!-- INDUSTRY -->
    <div class="section fade-in">
        <h2>What the Industry Does</h2>
        <p class="note">You don't have to figure this out from scratch — it's been done</p>

        <div class="pattern">
            <div class="pattern-num">1</div>
            <div class="pattern-text">
                <strong>Unstructured.io</strong> (most popular OSS)
                <p>Parses PDFs, DOCX, PPTX, emails, HTML. Partitions by document elements. Preserves hierarchy + metadata. Used by LangChain, LlamaIndex, most RAG pipelines.</p>
            </div>
        </div>
        <div class="pattern">
            <div class="pattern-num">2</div>
            <div class="pattern-text">
                <strong>LangChain Splitters</strong>
                <p><code>RecursiveCharacterTextSplitter</code> with chunk_size + overlap. <code>PyPDFLoader</code> splits by pages with metadata. Good for simple documents.</p>
            </div>
        </div>
        <div class="pattern">
            <div class="pattern-num">3</div>
            <div class="pattern-text">
                <strong>Google DIY RAG Reference</strong>
                <p>Document AI Layout Parser → context-aware chunking → Vertex AI Vector Search → Check Grounding API. Full Colab notebook available.</p>
            </div>
        </div>
        <div class="pattern">
            <div class="pattern-num">4</div>
            <div class="pattern-text">
                <strong>Event-driven Auto-sync</strong> (production pattern)
                <p>GCS upload → Eventarc → Cloud Run → ImportRagFiles. Terraform for infra, Pub/Sub for batching. Blog: "Auto-Sync RAG Pipeline" by Suhas Mallesh.</p>
            </div>
        </div>
    </div>

    <div class="divider">· · ·</div>

    <!-- PLAN -->
    <div class="section fade-in">
        <h2>4-Week Plan</h2>

        <div class="timeline">
            <div class="timeline-week">
                <div class="timeline-marker w1">W1</div>
                <div class="timeline-content">
                    <h4>Upload + Extraction</h4>
                    <ul>
                        <li>Signed URL endpoint (V4, 15-min expiry, per-user folder)</li>
                        <li>GCS bucket: <code>/uploads/{user_id}/{ts}/</code> raw, <code>/processed/{user_id}/{parent_id}/</code> extracted</li>
                        <li>Eventarc trigger on <code>object.finalized</code></li>
                        <li>Cloud Run processor: zip extraction + file splitting</li>
                    </ul>
                </div>
            </div>
            <div class="timeline-week">
                <div class="timeline-marker w2">W2</div>
                <div class="timeline-content">
                    <h4>Validation + Processing + DB</h4>
                    <ul>
                        <li>Validation layer: check types, reject unsupported, scan corruption</li>
                        <li>Folder/subfolder creation for UI</li>
                        <li>Parent-child tracking in DB <span class="badge done">Abhijit committed tables</span></li>
                        <li>Real-time status updates (processing → available → failed)</li>
                    </ul>
                </div>
            </div>
            <div class="timeline-week">
                <div class="timeline-marker w3">W3</div>
                <div class="timeline-content">
                    <h4>Ingestion Pipeline</h4>
                    <ul>
                        <li>Enable Document AI Layout Parser</li>
                        <li>RAG corpus: 1024 tokens, 256 overlap</li>
                        <li>ImportRagFiles from /processed/ with Layout Parser</li>
                        <li>Parent → child chunk mapping for UI retrieval</li>
                        <li>Import result sink to BigQuery</li>
                    </ul>
                </div>
            </div>
            <div class="timeline-week">
                <div class="timeline-marker w4">W4</div>
                <div class="timeline-content">
                    <h4>Integration + Testing</h4>
                    <ul>
                        <li>ADK agent with <code>vertex_ai_rag_retrieval</code></li>
                        <li>Error handling: retry, dead letter queue</li>
                        <li>E2E test: 1GB zip → extract → split → ingest → chat</li>
                    </ul>
                </div>
            </div>
        </div>
    </div>

    <!-- DECISIONS -->
    <div class="section fade-in">
        <h2>Decisions for Monday</h2>

        <div class="decision">
            <div class="q">1. Eventarc vs Airflow?</div>
            <p class="a">Eventarc for file-arrival triggers. Abhijit's Airflow GCS hooks weren't reliable. Eventarc can optionally trigger an Airflow DAG if complex orchestration is needed.</p>
            <div class="verdict">→ Eventarc primary, Airflow optional orchestration layer</div>
        </div>
        <div class="decision">
            <div class="q">2. Cloud Run vs dedicated pod?</div>
            <p class="a">Cloud Run scales to zero, handles burst, 32GB memory. Dedicated pod only if processing >60 min per file.</p>
            <div class="verdict">→ Cloud Run for most cases</div>
        </div>
        <div class="decision">
            <div class="q">3. Layout Parser vs default?</div>
            <p class="a">Layout Parser for PDFs with tables/charts (higher accuracy, Document AI pricing). Default for plain text.</p>
            <div class="verdict">→ Layout Parser for PDFs, default for text</div>
        </div>
        <div class="decision">
            <div class="q">4. DOCX splitting strategy?</div>
            <p class="a">Split by Heading 1 boundaries, not arbitrary bytes. Context preamble per chunk. This preserves the context Abhijit was worried about.</p>
            <div class="verdict">→ Heading-based splitting + context preamble</div>
        </div>
        <div class="decision">
            <div class="q">5. Parent-child UI model?</div>
            <p class="a">User sees parent file, system sends child chunks to Vertex. DB: files + file_chunks tables. Abhijit already committed the schema changes.</p>
            <div class="verdict">→ Build on Abhijit's committed tables</div>
        </div>
        <div class="decision">
            <div class="q">6. Trigger approach?</div>
            <p class="a">Ashish suggested TCP connector / GCS hooks. Eventarc is exactly this — event-driven, Google-managed, not polling.</p>
            <div class="verdict">→ Eventarc = the GCS hook that actually works</div>
        </div>
    </div>

    <div class="footer">
        <p>Research compiled by Claw 🐾 for Ashish · April 2, 2026</p>
        <p>Sources: Google Cloud docs, Vertex AI RAG Engine docs, industry patterns</p>
    </div>

 </div>
 <script>
 const observer = new IntersectionObserver(entries => {
    entries.forEach(e => { if (e.isIntersecting) e.target.classList.add('visible'); });
 }, { threshold: 0.1 });
 document.querySelectorAll('.fade-in').forEach(el => observer.observe(el));
 </script>
 </body>
 </html>
No results found