Skip to content

Instantly share code, notes, and snippets.

@mariogarcia-ar
Last active August 18, 2025 17:09
Show Gist options
  • Select an option

  • Save mariogarcia-ar/06d2182945dc9842c08dc68738ba345d to your computer and use it in GitHub Desktop.

Select an option

Save mariogarcia-ar/06d2182945dc9842c08dc68738ba345d to your computer and use it in GitHub Desktop.

Copilot Instructions for fastapi_ml_service

Project Purpose

This is a production-ready FastAPI backend for serving APIs, real-time features, and machine learning models. It supports:

  • RESTful endpoints
  • WebSocket-based real-time communication
  • Secure authentication (JWT/OAuth2)
  • Async database operations (SQL and NoSQL)
  • Distributed ML workloads (e.g., Stable Diffusion via Celery)
  • Observability with logging and metrics

How to Assist

General Code Style

  • Use async and await for all route handlers and DB interactions.
  • Follow PEP8 style with type hints and docstrings.
  • Use Pydantic models for all request bodies and responses.
  • Import Depends and HTTPException from fastapi, not starlette.

Directory Conventions

app/main.py

  • Entry point of the application.
  • Mounts all routers from app/api/*.
  • Loads middleware (CORS, metrics) and instruments Prometheus.

app/api/

  • One file per route group (e.g., predict.py, auth.py).
  • Use APIRouter, not FastAPI().
  • Register routers in main.py.

app/models/

  • schemas.py: Pydantic models used in requests/responses.
  • orm.py: SQLAlchemy ORM models mapped to tables.
  • Keep Pydantic and ORM separate.

app/services/

  • Contains logic classes/functions for business operations.
  • Avoid using FastAPI dependencies here (keep it pure Python).

app/db/

  • session.py: defines the async session and engine.
  • init_db.py: creates tables and seeds initial data.
  • Use SQLAlchemy 2.0 async style.

app/auth/

  • security.py: handles password hashing, JWT creation/validation.
  • dependencies.py: provides DI for authenticated users.

app/workers/

  • Defines Celery tasks (e.g., text-to-image generation).
  • Tasks must be idempotent and serializable.

app/monitoring/

  • metrics.py: Prometheus counters/timers
  • logging_config.py: sets up structured logging with levels
  • Include /metrics and /health endpoints

Auth Rules

  • Use OAuth2 password flow with JWT tokens.
  • Store passwords hashed (e.g., bcrypt).
  • All /users/me and protected routes must use Depends(get_current_user).

Performance

  • Load models once at startup (@lru_cache or FastAPI startup event).
  • Avoid re-loading joblib or transformers pipelines per request.
  • Offload blocking model inference to Celery tasks.

Dependencies

Use these libraries:

  • fastapi, pydantic, uvicorn
  • sqlalchemy[asyncio], databases, alembic
  • motor (for MongoDB)
  • passlib, python-jose, bcrypt
  • celery, redis
  • prometheus_fastapi_instrumentator, sentry-sdk, loguru
  • numpy, pandas, scikit-learn, joblib, diffusers, transformers

Test Guidelines

  • All tests should be async def.
  • Use httpx.AsyncClient and fixtures from conftest.py.
  • Mock external services and override dependencies in tests.

Output Examples

REST Endpoint

@router.post("/predict", response_model=PredictionResponse)
async def predict(data: InputData, model=Depends(get_model)):
    result = model.predict([data.feature_vector])
    return PredictionResponse(result=result[0])

WebSocket Handler

@router.websocket("/ws/detect")
async def detect(websocket: WebSocket):
    await websocket.accept()
    while True:
        data = await websocket.receive_text()
        image = decode_base64_image(data)
        results = detect_objects(image)
        await websocket.send_json(results)

Avoid

  • Avoid blocking I/O (open(), requests) in route handlers.
  • Do not use sync DB calls.
  • Do not define SQLAlchemy and Pydantic models in the same file.
  • Do not store secrets in .py files.

Environment

  • Load environment variables using pydantic.BaseSettings from config.py
  • Use .env and python-dotenv during development
  1. Ingesta de datos raw desde APIs
  2. Transformación inicial y estandarización mínima
  3. Procesamiento avanzado (alineación temporal, espacial, etc.)

Todo organizado para facilitar uso en notebooks, scripts o pipelines programados.


📁 Estructura mínima del módulo

data_pipeline/
├── pipeline/                    # Lógica principal del pipeline
│   ├── __init__.py
│   ├── ingest.py                # Capa 1: Obtención y almacenamiento de datos RAW
│   ├── transform.py             # Capa 2: Unificación + estandarización básica
│   ├── process.py              # Capa 3: Alineación temporal, espacial, resolución
│   ├── io_utils.py             # Funciones de lectura/escritura (raw/intermediate/final)
│   └── config.py               # Rutas, claves API, parámetros generales
│
├── data/                       # Almacenamiento local organizado por etapa
│   ├── raw/                    # Datos originales crudos (directo del API)
│   ├── transformed/            # Datos unificados y estandarizados
│   └── processed/              # Datos finales listos para análisis/modelos
│
├── notebooks/                  # Notebooks de uso, pruebas o exploración
│   └── example_pipeline.ipynb
│
├── requirements.txt
├── README.md
└── setup.py                    # (opcional) instalación como paquete

🧩 Descripción de etapas

1. ingest.py

  • Se encarga de llamar a la API externa
  • Guarda los resultados sin modificar en data/raw/
  • Puede trabajar en modo batch o tiempo real

2. transform.py

  • Lee los archivos desde data/raw/
  • Agrupa o concatena lotes
  • Realiza normalización de columnas, formatos, tipos
  • Guarda resultado en data/transformed/

3. process.py

  • Lee desde data/transformed/

  • Aplica lógica avanzada como:

    • Interpolación temporal
    • Alineación de resoluciones espaciales
    • Sincronización de frecuencias
  • Guarda resultados finales en data/processed/


Ventajas de esta estructura

  • Trazabilidad: puedes auditar cada capa de transformación
  • Modularidad: puedes probar o cambiar una etapa sin afectar el resto
  • Escalabilidad: fácil de integrar con Airflow, Prefect o DVC más adelante
  • Notebook-ready: cada etapa se puede invocar fácilmente desde notebooks

✅ GitHub Copilot Instructions – Product Development Lifecycle

You are assisting in a structured product development process that evolves across four phases: Prototype, MVP, Release 1, and Evolution. Each phase has specific goals and quality expectations. Testing is required at all stages. Generate code that aligns strictly with the current phase.


🔁 Global Guidelines

  • Always prioritize mature, battle-tested libraries. Do not reimplement:

    • Caching (joblib, diskcache)
    • Logging (logging)
    • Config (pydantic, configparser, dotenv)
    • HTTP clients (requests, httpx)
    • Serialization (json, pandas, pyarrow)
  • Do not create wrappers around cache/log/config unless explicitly instructed.

  • Minimize external dependencies in early phases.

    • Use filesystem/local memory instead of Redis, S3, or cloud services in Prototype and MVP.
  • Testing is never optional.

    • Write tests from Prototype onwards.
    • Use test-first or test-after methods depending on phase and complexity.

🧪 PHASE 1 – PROTOTYPE (Technical Feasibility)

  • Purpose: Validate feasibility with the least code possible.

Copilot Guidelines:

  • Prioritize working code over structure
  • Avoid modularization and input validation
  • Prefer inline or script-style logic
  • Use static inputs, fakes, or hardcoded values
  • Avoid error handling, config files, retries

Testing Expectations:

  • Write minimal unit tests for non-trivial logic
  • Use simple assert statements or test functions at the bottom of the script
  • No test coverage or file structure required
  • Avoid live API calls: use fakes or static fixtures

Agile Practices:

  • XP: Working code over design
  • Lean: Fast feedback, discardable code
  • Scrum: Spike stories for discovery
  • Kanban: Short cycles, visual feedback

🚧 PHASE 2 – MVP (Minimum Viable Product)

  • Purpose: Deliver the smallest usable version of the product with early reliability.

Copilot Guidelines:

  • Apply Test-Driven Development for new logic
  • Modularize: break into fetch, transform, output, etc.
  • Use config files instead of hardcoded values
  • Begin basic validation, retry, fallback mechanisms
  • Add error boundaries where needed

Testing Expectations:

  • Write unit tests before/with implementation
  • Use mocks/fakes instead of live APIs
  • Create test files in a tests/ folder
  • Use pytest-compatible structure
  • Cover all data-transforming or state-changing functions

Agile Practices:

  • XP: TDD, refactor, small commits
  • Scrum: Deliver iteration-ready functionality
  • Kanban: Use WIP limits for testable units
  • Lean: Build only what’s required

🚀 PHASE 3 – RELEASE 1 (Production Ready)

  • Purpose: Build a robust and maintainable product for real usage.

Copilot Guidelines:

  • Follow strict test-first discipline
  • Use typed functions, layered structure, and clear interfaces
  • Handle all error cases and I/O issues
  • Use dependency injection for API/DB access
  • Add docs and CLI/script entrypoints

Testing Expectations:

  • Achieve full unit and integration test coverage
  • Test all modules and code paths, including edge cases
  • Use pytest, coverage, tox or similar
  • Validate all API contracts, model output, and file I/O

Agile Practices:

  • XP: CI, full coverage, clean interfaces
  • Scrum: “Definition of Done” includes tests, docs, type safety
  • Lean: Prevent bugs via tests early
  • Kanban: Visualize tested deliverables

📈 PHASE 4 – EVOLUTION (Scalability, Observability, DX)

  • Purpose: Scale the system, observe health, and support developer workflows.

Copilot Guidelines:

  • Maintain 100% test coverage
  • Add regression, contract, CLI, and e2e tests
  • Optimize tests for performance, reproducibility, reliability
  • Integrate metrics, health checks, profiling
  • Add CI pipelines, Docker, Makefiles, CLI tools

Testing Expectations:

  • Cover: CLI usage, full pipeline integration, model regression
  • Validate: data schema, pipeline behavior, prediction output
  • Monitor test suite health: speed, flakiness, failure rate
  • Integrate with GitHub Actions or equivalent CI/CD

Agile Practices:

  • XP: Shared ownership, pair programming
  • Scrum: Continuous improvement, automation
  • Lean: Test-based telemetry and learning
  • Kanban: Alerting and auto-recovery from regressions

🧾 Summary Table

Phase Focus Code Quality Test Requirement Agile Priority
Prototype Feasibility Minimal Minimal unit tests XP (runs), Lean (fast learn)
MVP Working logic Moderate Unit tests + mocks XP (TDD), Scrum (demo-ready)
Release Robust product High Full coverage required XP (CI), Scrum (“Done” = tested)
Evolution Scale and reliability Very High Full + regression XP + Kanban + Lean (observability)

🧠 Prompting Guidelines

  • Prototype:

    • "Write a minimal script that runs end-to-end"
    • "Include assert to check shape or content"
  • MVP:

    • "Write test before function"
    • "Use fake API result to test transformer"
  • Release 1:

    • "Add unit and integration tests"
    • "Document function and validate inputs"
  • Evolution:

    • "Write contract test for schema"
    • "Test CLI runner for full pipeline"

⛔ Out of Scope

  • Custom logging/config/cache modules (unless requested)
  • Terraform, CDK, or cloud infra definitions
  • Production code without associated test in MVP+
  • Real API calls in tests (must use mocks/stubs)

GitHub Copilot Instructions – Product Development Lifecycle

You are assisting in a structured product development process that evolves across defined phases: Prototype, MVP, Release 1, and Evolution. Each phase has specific goals, expectations, and code quality requirements.

Generate code that aligns strictly with the active development phase.


PHASE 1 – PROTOTYPE (Technical Feasibility)

Purpose: Quickly test if a feature or integration is technically possible. Focus is on core logic and flow.

Copilot Guidelines:

  • Write only the code required to demonstrate basic functionality
  • Use dummy or hardcoded data
  • No input validation
  • No error handling
  • No modularization
  • No comments unless critical
  • No typing or documentation
  • Flat, inline, or script-style code preferred

Example Prompts:

  • "Prototype a FastAPI endpoint that echoes a POSTed JSON."
  • "Show a function that filters even numbers from a list."

PHASE 2 – MVP (Minimum Viable Product)

Purpose: Build a functional version with the essential use cases covered. Begin addressing stability and structure.

Copilot Guidelines:

  • Add minimal error handling and input validation
  • Refactor logic into reusable functions
  • Introduce types where helpful
  • Modularize code into clear files or components
  • Use config files or environment variables instead of hardcoded values
  • Comments only where necessary to understand logic

Example Prompts:

  • "Convert prototype to include validation and fallback logic."
  • "Split the logic into service and handler functions."

PHASE 3 – RELEASE 1 (Production Ready)

Purpose: Prepare for deployment and real-world usage. Focus on robustness, clarity, and maintainability.

Copilot Guidelines:

  • Full input validation and error handling
  • Type annotations throughout
  • Layered structure: controller, service, model, schema, etc.
  • Add logging, configuration, and extensibility
  • Write documentation and docstrings for all public code
  • Include unit and integration tests

Example Prompts:

  • "Refactor MVP with full typing and docstrings."
  • "Add test cases for edge cases and error scenarios."

PHASE 4 – EVOLUTION (Scaling, Observability, Maintenance)

Purpose: Ensure long-term maintainability and scale. Add observability, developer experience tooling, and infrastructure readiness.

Copilot Guidelines:

  • Introduce metrics, tracing, health checks
  • CI/CD integration (GitHub Actions, Docker, etc.)
  • Package management and dependency control
  • Add linting, formatting, test coverage
  • Create CLI tools, SDKs, or language bindings if required
  • Structure repository for collaboration and open-source contribution

Example Prompts:

  • "Add Prometheus metrics endpoint."
  • "Generate GitHub Actions CI workflow."

SUMMARY OF PHASE EXPECTATIONS

Phase Focus Code Quality Error Handling Structure Typing & Docs
Prototype Feasibility Minimal None Flat None
MVP Core Usability Moderate Basic Functional Partial
Release 1 Production Baseline High Full Layered Complete
Evolution Scale & Maintainability Very High Advanced Modular Complete + Docs

Prompting Guidelines

To ensure Copilot understands your phase-specific intent, use prompt styles like:

  • Prototype: "Show a minimal version of..."
  • MVP: "Add validation and modularize the logic for..."
  • Release 1: "Refactor this into production-level code with types, docs, and tests."
  • Evolution: "Add metrics, observability, and Docker support for this module."

Out of Scope

Unless explicitly requested, Copilot should not generate:

  • Complex infrastructure (e.g., Terraform, AWS CDK)
  • Business rule validation logic
  • Multi-language integration layers
  • Full-scale observability stacks

Microsoft Access Application Best Practices for GitHub Copilot

This guide outlines best practices for developing Microsoft Access applications using forms, subforms, navigation panels, and VBA logic. It includes performance tips, naming conventions, and UI/UX guidance for older displays and fixed-size layouts.


Language and Error Handling

  • All error messages must be displayed in Spanish.
  • User interface text should be in Spanish where applicable.
  • Comments in code can be in English for developer clarity.
  • Do not use emoji in any code, comments, or documentation.

UI Architecture & Navigation

  • Use a Navigation Form as the central dashboard.
  • Organize forms by functional area (e.g., Customers, Orders, Products).
  • Implement tab controls to group related fields/data.
  • Use subforms and modal dialogs for detail views or advanced editing.
  • Avoid overloading a single screen—split into logical, focused forms.

UI/UX Best Practices for Old Monitors (1024×768 or 1280×800)

Target resolution: 1024×768 px (safe zone for legacy monitors)

  • Fixed Form Size: Set main forms to a fixed size of approx.:
    • Width: 960 px
    • Height: 700 px This ensures compatibility across older displays and avoids scrollbars.
  • Set Auto Resize = No and Auto Center = Yes in form properties.
  • Set Fit to Screen = No to prevent unexpected resizing on load.
  • Use fonts like Tahoma, Segoe UI, or Arial at 9pt–10pt for readability.
  • Avoid overlapping pop-up forms; use modal forms with defined boundaries.
  • Keep buttons aligned left or top where screen space is limited.
  • Avoid full-screen forms unless the target environment supports widescreens.

Forms & Subforms Guidelines

Master-Detail Pattern Implementation

  • Primary Form (Master): Contains parent record data with navigation controls
  • Subform (Detail): Embedded form showing related child records
  • Link parent and child forms via Link Master Fields and Link Child Fields
  • Master-detail synchronization best practices:
    ' In master form's On Current event
    Private Sub Form_Current()
        On Error GoTo ErrorHandler
        
        If Not IsNull(Me.ID) Then
            ' Requery subform when master record changes
            Me.subfrmDetails.Form.Requery
            ' Update subform filter if needed
            Me.subfrmDetails.Form.Filter = "MasterID = " & Me.ID
            Me.subfrmDetails.Form.FilterOn = True
        End If
        
        Exit Sub
    ErrorHandler:
        MsgBox "Error al sincronizar registros: " & Err.Description, vbCritical, "Error del Sistema"
    End Sub
  • Subform Configuration:
    • Use Continuous Forms instead of datasheets for better formatting control
    • Set Allow Additions = Yes for new child records
    • Set Allow Deletions based on business rules
    • Configure Link Master Fields = "MasterID" (parent key field)
    • Configure Link Child Fields = "MasterID" (foreign key field)
  • Navigation Synchronization:
    ' In subform's After Update event
    Private Sub Form_AfterUpdate()
        ' Refresh master form calculations if needed
        Parent.Form.Recalc
        ' Update master form totals
        Parent.Form.Requery
    End Sub
  • Keep subform queries optimized and filtered on load
  • Use Me.Dirty = False to save master record before adding details
  • Implement cascading operations (delete details when master is deleted)

Filter and Grid Display Guidelines

  • Search and Filter Implementation:
    ' Global search function for forms
    Private Sub txtBuscar_AfterUpdate()
        On Error GoTo ErrorHandler
        
        Dim strFilter As String
        If Not IsNull(Me.txtBuscar) And Len(Me.txtBuscar) > 0 Then
            ' Build filter string for multiple fields
            strFilter = "[Nombre] Like '*" & Me.txtBuscar & "*' OR " & _
                       "[Apellido] Like '*" & Me.txtBuscar & "*' OR " & _
                       "[Email] Like '*" & Me.txtBuscar & "*'"
            
            Me.Filter = strFilter
            Me.FilterOn = True
        Else
            Me.FilterOn = False
        End If
        
        Exit Sub
    ErrorHandler:
        MsgBox "Error al filtrar registros: " & Err.Description, vbCritical, "Error del Sistema"
    End Sub
  • Advanced Filter Options:
    ' Combo box filter implementation
    Private Sub cmbFiltroEstado_AfterUpdate()
        On Error GoTo ErrorHandler
        
        If Not IsNull(Me.cmbFiltroEstado) Then
            Me.Filter = "[Estado] = '" & Me.cmbFiltroEstado & "'"
            Me.FilterOn = True
        Else
            Me.FilterOn = False
        End If
        
        ' Update record count label
        Me.lblConteoRegistros.Caption = "Registros: " & Me.RecordsetClone.RecordCount
        
        Exit Sub
    ErrorHandler:
        MsgBox "Error al aplicar filtro: " & Err.Description, vbCritical, "Error del Sistema"
    End Sub
  • Grid Display Best Practices:
    • Use Continuous Forms for better control over appearance
    • Set row height to accommodate 9-10pt fonts
    • Implement alternating row colors for better readability
    • Add horizontal lines between records if needed
    • Configure column widths to fit 960px form width
    • Use text boxes instead of labels for data display (better performance)
  • Grid Navigation and Selection:
    ' Handle record selection in continuous forms
    Private Sub Form_Current()
        On Error GoTo ErrorHandler
        
        ' Highlight current record
        If Not IsNull(Me.ID) Then
            Me.Detail.BackColor = RGB(220, 235, 250)  ' Light blue highlight
        End If
        
        ' Update status bar or related controls
        Me.lblRegistroActual.Caption = "Registro " & (Me.CurrentRecord) & " de " & Me.RecordsetClone.RecordCount
        
        Exit Sub
    ErrorHandler:
        MsgBox "Error en navegación: " & Err.Description, vbCritical, "Error del Sistema"
    End Sub
  • Performance Optimization for Grids:
    • Limit initial recordset size (use TOP 100 or similar)
    • Implement paging for large datasets
    • Use indexes on filtered fields
    • Avoid calculated fields in continuous forms when possible
  • Clear Filter Functionality:
    Private Sub btnLimpiarFiltro_Click()
        On Error GoTo ErrorHandler
        
        ' Clear all filter controls
        Me.txtBuscar = Null
        Me.cmbFiltroEstado = Null
        Me.FilterOn = False
        
        ' Refresh record count
        Me.lblConteoRegistros.Caption = "Registros: " & Me.RecordsetClone.RecordCount
        
        Exit Sub
    ErrorHandler:
        MsgBox "Error al limpiar filtros: " & Err.Description, vbCritical, "Error del Sistema"
    End Sub

Performance Optimization

  • Use WHERE conditions when opening forms:
    DoCmd.OpenForm "frmOrders", , , "CustomerID = 42"
  • Split frontend/backend architecture:
    • Backend (*.accdb or SQL): tables only.
    • Frontend: forms, logic, and reports with linked tables.
  • Load subforms and large data only when needed (Visible = False then Load on demand).

Logic & VBA Standards

  • Centralize logic in shared modules (modValidation, modUtils, etc.).
  • Create reusable functions in dedicated modules:
    • modFuncionesFormulario - Form-related utility functions including master-detail synchronization
    • modFuncionesValidacion - Data validation functions
    • modFuncionesBD - Database operation functions
    • modFuncionesReportes - Report generation functions
  • Master-Detail Reusable Functions: Create standardized functions in modFuncionesFormulario for:
    ' Synchronize master-detail forms
    Public Function SincronizarMaestroDetalle(frmMaster As Form, subfrmName As String, masterKeyField As String) As Boolean
    ' Validate master record before detail operations
    Public Function ValidarMaestroParaDetalle(frmMaster As Form) As Boolean
    ' Calculate totals from detail records
    Public Function CalcularTotalesDetalle(subfrmName As String, sumField As String) As Double
    ' Handle cascading delete operations
    Public Function EliminarDetallesCascada(masterID As Long, detailTable As String, foreignKeyField As String) As Boolean
  • Common reusable functions should include:
    • Form navigation and state management
    • Master-detail form synchronization helpers
    • Master-detail validation and calculation functions
    • Data formatting and conversion
    • Input validation routines
    • Database connection helpers
    • Common UI operations (enable/disable controls, etc.)
  • Standardize error handling using:
    On Error GoTo ErrorHandler
    ' Error messages must be in Spanish
    MsgBox "Error: " & Err.Description, vbCritical, "Error del Sistema"
  • Use events: On Load, After Update, On Current, etc., for dynamic UI behavior.
  • Prefer VBA over macros for robustness and better debugging.
  • All user-facing error messages must be in Spanish.

UI/UX Consistency

  • Use consistent fonts, colors, and control spacing across all forms.
  • Label every input field clearly. Add tooltips where needed.
  • Set logical tab order (Tab Index) across every form.
  • Standardize button locations (e.g., Save/Cancel always bottom-right).
  • Use icons sparingly and only where they add clear value.

Security & Data Integrity

  • Hide backend objects and disable direct navigation where needed.
  • Use logic-based permissions to hide or disable UI components by role.
  • Implement basic audit logging using hidden tables or append queries.
  • Use read-only forms for sensitive data views.

File Types Used in Microsoft Access (Resumen Visual)

Categoría Extensiones principales Descripción
Bases de datos .accdb, .mdb, .accde, .mde Archivos de base de datos. .accdb es el formato moderno; .accde es compilado sin código editable.
Módulos de código .bas, .cls, .vba Código VBA exportado. .bas para módulos estándar, .cls para formularios/clases, .vba menos común.
Formularios e informes .frm, .frx, .rep Estructuras visuales exportadas como texto (usadas en sistemas con control de versiones o automatización).
Datos y estructura .sql, .csv, .xml, .txt, .json Datos estructurados o scripts SQL para creación/importación de datos.
Macros y plantillas .accda, .accdt, .maf Complementos, plantillas, y macros exportadas.
Integración externa .xlsx, .xls, .pdf, .html Archivos para importar/exportar datos e informes.
Control y seguridad .laccdb, .ldb, .bak, .udl, .dsn Archivos generados por Access para bloqueo multiusuario, backups, y conexiones externas.

Nota: Muchos de estos archivos no se manipulan directamente desde la interfaz de Access, pero son clave en entornos de desarrollo, integración, y mantenimiento profesional.

File Type Usage Guidelines

  • Development Environment: Use .accdb for development and .accde for production deployment
  • Version Control: Export forms and modules as .bas, .cls, and .frm files for source control
  • Data Exchange: Prefer .csv or .xlsx for data imports/exports over proprietary formats
  • Backup Strategy: Maintain .bak files and monitor .laccdb files for multi-user scenarios
  • Documentation: Export database schema to .sql files for documentation and deployment scripts

Naming Conventions

Object Prefix Example
Table tbl tblCustomers
Query qry qrySalesByRegion
Form frm frmInvoices
Subform subfrm subfrmInvoiceItems
Module mod modBusinessRules
Controls txt, cmb, lbl, etc. txtFirstName, cmbStatus

Language Conventions for Code Elements

  • Function Names: Use English with descriptive names
    Public Function SynchronizeMasterDetail() As Boolean
    Public Function ValidateCustomerData() As Boolean
    Public Function CalculateOrderTotal() As Double
  • Variable Names: Use English with Hungarian notation where appropriate
    Dim strCustomerName As String
    Dim intOrderCount As Integer
    Dim blnIsValid As Boolean
  • Table Field Names: Keep original Spanish field names as they appear in the database
    ' Reference Spanish field names directly
    strFilter = "[Nombre] Like '*" & searchText & "*'"
    Me.txtNombre = recordset!Nombre
    If Not IsNull(Me.FechaCreacion) Then...
  • Constants: Use English with uppercase naming
    Const MAX_RECORDS As Integer = 1000
    Const DEFAULT_STATUS As String = "Activo"

Development Practices

  • Maintain separate dev and prod frontend files.
  • Use TempVars or hidden forms to pass global values between forms.
  • Track application version with a lblVersion control on the main form.
  • Document complex VBA logic inline with comments for maintainability.

Copilot Contribution Guidelines

  • Perform only the specific task requested - Do not add extra features, suggestions, or improvements unless explicitly asked to perform those specific tasks
  • Follow the naming conventions listed above.
  • Use English for function and variable names, Spanish for table field references
  • Use modular design with a single responsibility per form/subform.
  • Create reusable functions in appropriate utility modules (modFuncionesFormulario, modFuncionesValidacion, etc.)
  • Always create reusable master-detail functions instead of repeating synchronization code across multiple forms
  • When suggesting VBA:
    • Include comments and error handling.
    • Validate data inputs before committing to the database.
    • Error messages must be in Spanish.
    • Consider if the code should be a reusable function in a utility module.
    • For master-detail operations, always suggest creating or using functions from modFuncionesFormulario
    • Use English for function/variable names, maintain Spanish table field names
  • Avoid macros unless requested explicitly.
  • When generating forms, ensure the form layout fits within 1024×768 px.
  • Never use emoji in code, comments, or documentation.

This project is built to support legacy hardware, enforce UX consistency, and keep logic clean. Use these standards to maximize maintainability and user satisfaction.

PLAN MAESTRO PARA DESARROLLAR UN SISTEMA EN MICROSOFT ACCESS


1. OBJETIVO GENERAL

Desarrollar un sistema integral en Microsoft Access para gestionar información específica (personas, productos, operaciones, etc.) con una interfaz amigable, control de acceso por roles y funcionalidades automatizadas.


2. ETAPAS DEL PROYECTO


2.1 ANÁLISIS FUNCIONAL Y DISEÑO

2.1.1 Relevamiento de Requisitos

  • Definir alcance y objetivo funcional del sistema
  • Identificar actores y roles del sistema (Administrador, Usuario, Consulta)
  • Determinar procesos clave a informatizar
  • Establecer requerimientos técnicos, legales y de seguridad

2.1.2 Diseño de Modelo de Datos

  • Diagramar modelo entidad-relación (DER)
  • Definir entidades principales, atributos y relaciones
  • Establecer claves primarias y foráneas
  • Normalizar tablas hasta 3FN (cuando sea aplicable)

2.1.3 Documento Técnico Inicial

  • Catálogo de tablas con campos, tipos de datos y propiedades
  • Reglas de validación y listas de valores
  • Requisitos de formularios, consultas, informes y lógica de negocio

2.2 IMPLEMENTACIÓN DE BASE DE DATOS (BACKEND)

2.2.1 Creación de Estructura de Tablas

  • Construcción de tablas en Microsoft Access
  • Configuración de tipos de datos correctos (Texto, Fecha/Hora, Número, Sí/No)
  • Asignación de propiedades: requerido, predeterminado, tamaño del campo

2.2.2 Establecimiento de Relaciones

  • Implementación de claves foráneas y relaciones uno-a-muchos, muchos-a-muchos (con tablas intermedias)
  • Habilitación de integridad referencial
  • Activación de acciones en cascada (actualización/eliminación), según necesidad

2.2.3 Carga Inicial de Datos Maestros

  • Ingreso de datos estáticos de referencia (Roles, Categorías, Tipos, Estados)
  • Verificación de consistencia de claves

2.3 DISEÑO DE INTERFAZ DE USUARIO (FRONTEND)

2.3.1 Formularios Principales

  • Creación de formularios para alta, baja y modificación de datos
  • Uso eficiente de controles: combos, cuadros de texto, botones, fechas
  • Inclusión de botones de navegación y acciones

2.3.2 Formularios Relacionados (Subformularios)

  • Relación maestro-detalle, como por ejemplo Clientes > Pedidos
  • Enlace mediante campos relacionados

2.3.3 Menú Principal y Navegación

  • Diseño de un formulario menú principal con accesos directos
  • Incorporación de botones para ingreso a módulos principales y reportes

2.3.4 Autenticación de Usuario (opcional)

  • Formulario de inicio de sesión con verificación de credenciales
  • Control de visibilidad y acceso según rol

2.4 LÓGICA DEL SISTEMA Y AUTOMATIZACIÓN (VBA)

2.4.1 Estructura de Módulos

  • Crear módulos estándar: seguridad, utilidades, validaciones
  • Código bien organizado y documentado

2.4.2 Programación de Eventos

  • Validaciones automáticas al guardar
  • Confirmación de acciones críticas
  • Cálculos automáticos y llenado dinámico de campos

2.4.3 Funcionalidades Avanzadas

  • Exportación de datos a Excel o PDF
  • Envío de correos automatizados
  • Generación de claves o códigos únicos

2.5 CONSULTAS Y REPORTES

2.5.1 Consultas

  • Consultas operativas (listados activos, búsquedas)
  • Consultas estadísticas (por periodo, categoría, etc.)

2.5.2 Informes

  • Diseños preparados para impresión y exportación
  • Filtros dinámicos por formulario
  • Gráficos integrados (barras, torta, líneas)

2.6 SEGURIDAD Y AUDITORÍA

2.6.1 Control de Accesos

  • Restricciones por rol en formularios, menús y botones
  • Validación al iniciar sesión o al acceder a secciones protegidas

2.6.2 Registro de Cambios

  • Tabla de log para auditoría básica (tbl_Log)
  • Campos adicionales: fecha de creación, usuario modificador

2.7 PRUEBAS Y DEPURACIÓN

2.7.1 Pruebas Funcionales

  • Verificación de todos los formularios y reportes
  • Simulación de distintos flujos de trabajo

2.7.2 Control de Errores

  • Manejo de errores con On Error
  • Mensajes claros, en español, orientados al usuario final

2.8 ENTREGA, DOCUMENTACIÓN Y SOPORTE

2.8.1 Preparación del Entorno Final

  • Compilación a formato .accde para uso operativo
  • Configuración de copias de seguridad

2.8.2 Documentación Técnica y Funcional

  • Manual de usuario
  • Diagrama de base de datos
  • Listado de módulos, formularios, consultas e informes

2.8.3 Mantenimiento y Soporte

  • Canal para reporte de errores y mejoras
  • Mapa de versiones futuras

3. ESTIMACIÓN DE TIEMPO DE DESARROLLO

Etapa Duración Estimada
2.1 Análisis y Diseño 4–6 horas
2.2 Backend 4 horas
2.3 Interfaz de Usuario 6–8 horas
2.4 Programación VBA 6–10 horas
2.5 Consultas e Informes 3–4 horas
2.6 Seguridad y Auditoría 2–3 horas
2.7 Pruebas y Depuración 2 horas
2.8 Documentación y Entrega 2 horas
Total Aproximado 30–40 horas

Copilot Instructions for fastapi_starter_kit

Project Purpose

This is a production-ready FastAPI backend for serving APIs, real-time features, and machine learning models. It supports:

  • RESTful endpoints
  • WebSocket-based real-time communication
  • Secure authentication (JWT/OAuth2)
  • Async database operations (SQL and NoSQL)
  • Distributed ML workloads (e.g., Stable Diffusion via Celery)
  • Observability with logging and metrics

Project Structure

This project uses Poetry for dependency management and packaging. The main application code is located in src/fastapi_starter_kit/.


How to Assist

General Code Style

  • Use async and await for all route handlers and DB interactions.
  • Follow PEP8 style with type hints and docstrings.
  • Use Pydantic models for all request bodies and responses.
  • Import Depends and HTTPException from fastapi, not starlette.
  • Use typing annotations for better code clarity and IDE support.

Directory Conventions

src/fastapi_starter_kit/main.py

  • Entry point of the application.
  • Mounts all routers from src/fastapi_starter_kit/api/*.
  • Loads middleware (CORS, metrics) and instruments Prometheus.
  • Include startup and shutdown event handlers.

src/fastapi_starter_kit/api/

  • One file per route group (e.g., predict.py, auth.py).
  • Use APIRouter, not FastAPI().
  • Register routers in main.py with proper prefixes and tags.

src/fastapi_starter_kit/models/

  • schemas.py: Pydantic models used in requests/responses.
  • orm.py: SQLAlchemy ORM models mapped to tables.
  • Keep Pydantic and ORM separate for clean architecture.

src/fastapi_starter_kit/services/

  • Contains logic classes/functions for business operations.
  • Avoid using FastAPI dependencies here (keep it pure Python).
  • Use dependency injection pattern for testability.

src/fastapi_starter_kit/db/

  • session.py: defines the async session and engine.
  • init_db.py: creates tables and seeds initial data.
  • Use SQLAlchemy 2.0 async style with proper session management.

src/fastapi_starter_kit/auth/

  • security.py: handles password hashing, JWT creation/validation.
  • dependencies.py: provides DI for authenticated users.
  • Use secure JWT practices with proper expiration and refresh tokens.

src/fastapi_starter_kit/workers/

  • Defines Celery tasks (e.g., text-to-image generation).
  • Tasks must be idempotent and serializable.
  • Use proper error handling and retry mechanisms.

src/fastapi_starter_kit/monitoring/

  • metrics.py: Prometheus counters/timers
  • logging_config.py: sets up structured logging with levels
  • Include /metrics and /health endpoints for observability

src/fastapi_starter_kit/config.py

  • Application configuration using pydantic.BaseSettings
  • Load environment variables from .env files
  • Use pydantic_settings for better configuration management

tests/

  • Test files organized to mirror the src/fastapi_starter_kit/ structure
  • Use conftest.py for shared fixtures
  • Follow pytest best practices with proper async test setup

Package Management with Poetry

Adding Dependencies

# Production dependencies
poetry add "fastapi>=0.115.0" "uvicorn[standard]>=0.34.0" "sqlalchemy>=2.0.0"

# Development dependencies  
poetry add --group dev pytest black isort mypy pre-commit

# Install dependencies
poetry install

Running the Application

# Activate virtual environment
poetry shell

# Run with uvicorn
poetry run uvicorn src.fastapi_starter_kit.main:app --reload --host 0.0.0.0 --port 8000

# Run tests
poetry run pytest

# Format code
poetry run black src/ tests/
poetry run isort src/ tests/

Auth Rules

  • Use OAuth2 password flow with JWT tokens.
  • Store passwords hashed using bcrypt or argon2.
  • All protected routes must use Depends(get_current_user).
  • Implement proper token refresh mechanisms.
  • Use HTTPS in production environments.

Performance

  • Load models once at startup using FastAPI lifespan events.
  • Avoid re-loading joblib or transformers pipelines per request.
  • Offload blocking model inference to Celery tasks.
  • Use connection pooling for database operations.
  • Implement proper caching strategies with Redis.

Dependencies

Use these libraries via Poetry (with version constraints):

[project]
name = "fastapi-starter-kit"
version = "0.1.0"
description = ""
authors = [
    {name = "Mario Garcia", email = "[email protected]"}
]
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "fastapi>=0.115.12,<0.116.0",
    "uvicorn[standard]>=0.34.2,<0.35.0",
    "sqlalchemy>=2.0.41,<3.0.0",
    "alembic>=1.16.1,<2.0.0",
    "pydantic>=2.0.0,<3.0.0",
    "pydantic-settings>=2.0.0,<3.0.0",
    "databases[postgresql]>=0.9.0,<1.0.0",
    "asyncpg>=0.29.0,<1.0.0",
    "passlib[bcrypt]>=1.7.4,<2.0.0",
    "python-jose[cryptography]>=3.3.0,<4.0.0",
    "celery>=5.3.0,<6.0.0",
    "redis>=5.0.0,<6.0.0",
    "prometheus-fastapi-instrumentator>=7.0.0,<8.0.0",
    "sentry-sdk[fastapi]>=2.0.0,<3.0.0",
    "loguru>=0.7.0,<1.0.0",
    "python-dotenv>=1.0.0,<2.0.0"
]

[tool.poetry.group.dev.dependencies]
black = "^25.1.0"
isort = "^6.0.1"
pytest = "^8.3.5"
httpx = "^0.28.1"
pytest-asyncio = "^0.25.0"
mypy = "^1.10.0"
pre-commit = "^4.0.0"
pytest-cov = "^6.0.0"

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py", "*_test.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
addopts = "-v --cov=src/fastapi_starter_kit --cov-report=term-missing"

[tool.black]
line-length = 88
target-version = ['py312']
include = '\.pyi?$'
extend-exclude = '''
/(
  migrations
)/
'''

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88

Test Guidelines

  • All tests should be async def when testing async code.
  • Use httpx.AsyncClient for API testing.
  • Use fixtures from conftest.py for common setup.
  • Mock external services and override dependencies in tests.
  • Aim for high test coverage (>90%).
  • Run tests with: poetry run pytest
  • Use pytest-asyncio for proper async test handling.

Output Examples

REST Endpoint with Error Handling

# src/fastapi_starter_kit/api/predict.py
from fastapi import APIRouter, Depends, HTTPException, status
from ..models.schemas import InputData, PredictionResponse, ErrorResponse
from ..services.ml import get_model
from ..auth.dependencies import get_current_user

router = APIRouter(prefix="/api/v1", tags=["predictions"])

@router.post(
    "/predict", 
    response_model=PredictionResponse,
    responses={
        400: {"model": ErrorResponse},
        401: {"model": ErrorResponse},
        500: {"model": ErrorResponse}
    }
)
async def predict(
    data: InputData, 
    model=Depends(get_model),
    current_user=Depends(get_current_user)
):
    """Generate predictions using the ML model."""
    try:
        result = await model.predict_async([data.feature_vector])
        return PredictionResponse(
            result=result[0], 
            confidence=result[1],
            user_id=current_user.id
        )
    except ValueError as e:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Invalid input data: {str(e)}"
        )
    except Exception as e:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail="Prediction service unavailable"
        )

WebSocket Handler with Error Handling

# src/fastapi_starter_kit/api/realtime.py
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
from ..services.detection import detect_objects, decode_base64_image
from ..monitoring.metrics import websocket_connections
import logging

router = APIRouter(prefix="/ws", tags=["websockets"])
logger = logging.getLogger(__name__)

@router.websocket("/detect")
async def detect_realtime(websocket: WebSocket):
    """Real-time object detection via WebSocket."""
    await websocket.accept()
    websocket_connections.inc()
    
    try:
        while True:
            data = await websocket.receive_text()
            try:
                image = decode_base64_image(data)
                results = await detect_objects(image)
                await websocket.send_json({
                    "status": "success",
                    "results": results,
                    "timestamp": datetime.utcnow().isoformat()
                })
            except Exception as e:
                logger.error(f"Detection error: {str(e)}")
                await websocket.send_json({
                    "status": "error",
                    "message": "Detection failed"
                })
    except WebSocketDisconnect:
        logger.info("WebSocket client disconnected")
    finally:
        websocket_connections.dec()

Configuration with Pydantic Settings

# src/fastapi_starter_kit/config.py
from pydantic_settings import BaseSettings
from typing import Optional

class Settings(BaseSettings):
    """Application settings loaded from environment variables."""
    
    # Application
    app_name: str = "FastAPI Starter Kit"
    debug: bool = False
    
    # Database
    database_url: str
    database_echo: bool = False
    
    # Authentication
    secret_key: str
    algorithm: str = "HS256"
    access_token_expire_minutes: int = 30
    
    # Redis
    redis_url: str = "redis://localhost:6379"
    
    # Monitoring
    sentry_dsn: Optional[str] = None
    log_level: str = "INFO"
    
    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

settings = Settings()

Avoid

  • Avoid blocking I/O (open(), requests) in route handlers - use aiofiles and httpx.
  • Do not use sync DB calls - always use async SQLAlchemy sessions.
  • Do not define SQLAlchemy and Pydantic models in the same file.
  • Do not store secrets in .py files - use environment variables.
  • Avoid using @lru_cache for async functions - use proper async caching.
  • Do not ignore error handling - always provide meaningful error responses.

Environment

  • Load environment variables using pydantic_settings.BaseSettings from config.py
  • Use .env files during development with python-dotenv
  • Keep sensitive configuration in environment variables, not in code
  • Use different configuration files for different environments (dev, staging, prod)
  • Validate all environment variables at startup

Production Considerations

  • Use proper logging with structured logs (JSON format)
  • Implement health checks and readiness probes
  • Use connection pooling for database connections
  • Implement proper rate limiting
  • Use HTTPS and secure headers in production
  • Monitor application metrics with Prometheus
  • Set up proper error tracking with Sentry
  • Use gunicorn or similar WSGI server for production deployment

Pragmatic Cursor Rules - Azure Document Intelligence

Core Philosophy: KISS (Keep It Simple, Stupid)

Focus ONLY on what we need RIGHT NOW. No over-engineering.

What We DON'T Need (Avoid These)

Complex Architecture Patterns

  • No dependency injection containers
  • No abstract factories or builders
  • No complex inheritance hierarchies
  • No repository patterns for simple API calls

Over-Engineering

  • No custom config management systems (use simple env vars)
  • No complex caching layers (start with simple @lru_cache)
  • No event systems or observers
  • No custom logging frameworks (use standard logging)

Premature Optimization

  • No async unless explicitly needed
  • No connection pooling until we hit limits
  • No batch processing until we have multiple documents
  • No performance monitoring until we have performance issues

Enterprise Patterns We Don't Need

  • No unit of work patterns
  • No specification patterns
  • No strategy patterns for simple conditionals
  • No facade patterns for single API calls

What We DO Need (Focus on These)

Simple, Direct Code

# Good: Direct and simple
def analyze_invoice(self, file_path: str) -> dict:
    with open(file_path, 'rb') as f:
        result = self.client.begin_analyze_document("prebuilt-invoice", f).result()
    return result

# Bad: Over-engineered
class DocumentAnalysisService:
    def __init__(self, strategy_factory: AnalysisStrategyFactory):
        self._strategy_factory = strategy_factory
        
    def analyze(self, request: AnalysisRequest) -> AnalysisResponse:
        strategy = self._strategy_factory.create_strategy(request.document_type)
        return strategy.execute(request)

Basic Error Handling

# Good: Simple try/catch
def analyze_document(self, model_id: str, file_path: str):
    try:
        with open(file_path, 'rb') as f:
            return self.client.begin_analyze_document(model_id, f).result()
    except FileNotFoundError:
        raise ValueError(f"File not found: {file_path}")
    except Exception as e:
        raise RuntimeError(f"Analysis failed: {e}")

# Bad: Complex error hierarchy
class DocumentIntelligenceError(Exception): pass
class AuthenticationError(DocumentIntelligenceError): pass
class ModelNotFoundError(DocumentIntelligenceError): pass
class ValidationError(DocumentIntelligenceError): pass

Minimal Type Hints

# Good: Essential types only
def analyze_invoice(self, file_path: str) -> dict:
    pass

# Bad: Over-specified types
def analyze_invoice(
    self, 
    file_path: Union[str, Path, os.PathLike[str]], 
    options: Optional[AnalysisOptions] = None
) -> AnalyzeResult:
    pass

Code Structure Rules

1. Single File Until It's Too Big

Start with ONE main file:

azure_doc_wrapper.py  # Everything goes here initially

Only split when the file gets over 500 lines or has clearly separate concerns.

2. No Abstract Classes

# Good: Simple class
class DocumentAnalyzer:
    def __init__(self, endpoint: str, key: str):
        self.client = DocumentIntelligenceClient(endpoint, AzureKeyCredential(key))
    
    def analyze_invoice(self, file_path: str):
        # Direct implementation

# Bad: Abstract base class
from abc import ABC, abstractmethod

class DocumentAnalyzer(ABC):
    @abstractmethod
    def analyze(self, document: Document) -> Result:
        pass

3. Configuration: Environment Variables Only

# Good: Simple environment variables
import os

class DocumentAnalyzer:
    def __init__(self):
        self.endpoint = os.getenv('AZURE_DOC_ENDPOINT')
        self.key = os.getenv('AZURE_DOC_KEY')

# Bad: Complex configuration system
from dataclasses import dataclass
from typing import Optional

@dataclass
class Config:
    endpoint: str
    key: str
    timeout: Optional[int] = 300
    retry_count: Optional[int] = 3
    
    @classmethod
    def from_env(cls): ...
    @classmethod  
    def from_file(cls): ...
    @classmethod
    def from_dict(cls): ...

4. Direct Method Names

# Good: Clear, direct names
def extract_invoice_data(self, file_path: str):
def extract_receipt_data(self, file_path: str):
def extract_id_data(self, file_path: str):

# Bad: Generic, abstract names
def process_document(self, document: Document, processor_type: ProcessorType):
def execute_analysis(self, request: AnalysisRequest):

Testing Rules

1. Start with Integration Tests

# Good: Test the real thing
def test_analyze_real_invoice():
    analyzer = DocumentAnalyzer()
    result = analyzer.extract_invoice_data('test_invoice.pdf')
    assert 'VendorName' in result

# Don't start with: Complex mocking
@patch('azure.ai.documentintelligence.DocumentIntelligenceClient')
def test_analyze_invoice_mocked(mock_client):
    # Complex setup...

2. One Test Per Feature

# Good: Simple, focused tests
def test_invoice_extraction():
    # Test one thing

def test_receipt_extraction():
    # Test one thing

# Bad: Complex test classes
class TestDocumentAnalyzer:
    @pytest.fixture
    def analyzer(self):
        # Complex setup
    
    @pytest.mark.parametrize("file_type,expected", [...])
    def test_analyze_various_formats(self):
        # Complex parameterized test

When to Add Complexity

Only add these when you actually hit the problem:

1. Add Async When You Have Multiple Documents

# Start with sync
def analyze_invoice(self, file_path: str):
    return self.client.begin_analyze_document(...)

# Add async only when processing multiple files
async def analyze_multiple_invoices(self, file_paths: list[str]):
    tasks = [self.analyze_invoice_async(path) for path in file_paths]
    return await asyncio.gather(*tasks)

2. Add Retries When API Calls Fail

# Start without retries
def analyze_document(self, model_id: str, file_path: str):
    return self.client.begin_analyze_document(model_id, file_data)

# Add retries when you see failures
def analyze_document(self, model_id: str, file_path: str, retries: int = 3):
    for attempt in range(retries):
        try:
            return self.client.begin_analyze_document(model_id, file_data)
        except Exception as e:
            if attempt == retries - 1:
                raise
            time.sleep(2 ** attempt)  # Simple exponential backoff

3. Add Validation When You Get Bad Input

# Start without validation
def analyze_invoice(self, file_path: str):
    with open(file_path, 'rb') as f:
        return self.client.begin_analyze_document("prebuilt-invoice", f)

# Add validation when you encounter issues
def analyze_invoice(self, file_path: str):
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    
    if not file_path.lower().endswith(('.pdf', '.jpg', '.png')):
        raise ValueError("Unsupported file format")
    
    with open(file_path, 'rb') as f:
        return self.client.begin_analyze_document("prebuilt-invoice", f)

Documentation Rules

1. Code Should Be Self-Documenting

# Good: Clear names, minimal comments
def extract_vendor_name_from_invoice(self, file_path: str) -> str:
    result = self.analyze_invoice(file_path)
    return result.documents[0].fields.get('VendorName', {}).get('content', '')

# Bad: Comments explaining unclear code
def process_doc(self, fp: str) -> str:
    # Extract vendor name from invoice document
    # using Azure Document Intelligence API
    # Returns vendor name string or empty string if not found
    res = self.analyze(fp, 'inv')
    return res.docs[0].flds.get('VN', {}).get('cont', '')

2. Docstrings Only for Public Methods

# Good: Docstring for public API
def extract_invoice_data(self, file_path: str) -> dict:
    """Extract structured data from invoice PDF.
    
    Args:
        file_path: Path to invoice PDF file
        
    Returns:
        Dictionary with extracted invoice fields
    """

def _parse_result(self, result):
    # Private method, no docstring needed
    pass

3. README with Examples Only

# Azure Document Intelligence Wrapper

## Install
pip install azure-ai-documentintelligence

## Usage
```python
from azure_doc_wrapper import DocumentAnalyzer

analyzer = DocumentAnalyzer()
invoice_data = analyzer.extract_invoice_data('invoice.pdf')
print(invoice_data['VendorName'])

That's it. No complex documentation until we have complex features.


## File Organization (Start Simple)

### Initial Structure

project/ ├── azure_doc_wrapper.py # Main wrapper class ├── example.py # Usage examples
├── test_wrapper.py # Basic tests ├── requirements.txt # Dependencies └── README.md # Simple usage guide


### Only Split When Needed (>500 lines)

project/ ├── azure_doc_wrapper/ │ ├── init.py # Main DocumentAnalyzer class │ ├── invoice.py # Invoice-specific methods (if >50 lines) │ └── receipt.py # Receipt-specific methods (if >50 lines) ├── tests/ │ └── test_analyzer.py └── examples/ ├── invoice_example.py └── receipt_example.py


## Dependencies: Minimal

### Start With
```txt
azure-ai-documentintelligence
python-dotenv  # Only if you need .env files

Don't Add Until Needed

  • pytest (add when you write tests)
  • requests (Azure SDK handles HTTP)
  • pydantic (add when you need data validation)
  • click (add when you need a CLI)
  • fastapi (add when you need an API)

Error Messages: User-Friendly

# Good: Clear error messages
def extract_invoice_data(self, file_path: str):
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Invoice file not found: {file_path}")
    
    try:
        result = self.client.begin_analyze_document("prebuilt-invoice", file_data)
        return self._extract_fields(result)
    except Exception as e:
        raise RuntimeError(f"Failed to analyze invoice '{file_path}': {e}")

# Bad: Technical error messages
def extract_invoice_data(self, file_path: str):
    try:
        result = self.client.begin_analyze_document("prebuilt-invoice", file_data)
        return self._extract_fields(result)
    except HttpResponseError as e:
        raise DocumentIntelligenceException(
            f"HTTP {e.status_code}: {e.message}", 
            error_code=e.error_code,
            correlation_id=e.correlation_id
        )

Summary: Build Only What You Need

  1. Start with one file, one class
  2. Use simple functions, not complex patterns
  3. Add features when you need them, not before
  4. Test with real files, not mocks
  5. Use environment variables for config
  6. Clear names over comments
  7. User-friendly error messages
  8. Minimal dependencies

Remember: You can always refactor later when you understand the real requirements better.

.cursorrules - KISS & Agile Focused PoC Development

Core Principles

KISS (Keep It Simple, Stupid)

  • Prioritize simple, readable solutions over clever ones
  • Choose the most straightforward approach that solves the problem
  • Avoid premature optimization - make it work first, optimize later
  • Use clear, descriptive variable and function names
  • Minimize dependencies and external libraries
  • Write self-documenting code that doesn't need extensive comments

Agile Focus

  • Deliver working software quickly and iteratively
  • Focus on MVP (Minimum Viable Product) features first
  • Break down large tasks into small, manageable chunks
  • Prioritize features that provide immediate value
  • Keep solutions flexible for rapid changes and iterations

Code Style Guidelines

General Rules

  • Write code that a junior developer can understand in 6 months
  • One function/method should do one thing well
  • Keep functions small (ideally under 20 lines)
  • Use meaningful names instead of comments when possible
  • Avoid deep nesting (max 3 levels)
  • Prefer composition over inheritance
  • Use early returns to reduce nesting

File Organization

  • Keep files small and focused on a single responsibility
  • Use clear folder structure that reflects business logic
  • Avoid deeply nested directory structures
  • Name files descriptively based on their primary function

PoC-Specific Guidelines

Speed over Perfection

  • Hard-code values when configuration adds unnecessary complexity
  • Use inline styles/logic if it speeds up development
  • Skip elaborate error handling for non-critical paths
  • Focus on the happy path first, edge cases later
  • Use TODO comments liberally for future improvements

Testing Strategy

  • Write tests for core business logic only
  • Prefer integration tests over unit tests for PoCs
  • Test the critical user journey, skip edge cases initially
  • Use simple assertion libraries, avoid complex test frameworks

Documentation

  • README should explain what the PoC proves, not how to use it
  • Include setup instructions in 5 steps or less
  • Document assumptions and limitations clearly
  • Keep API documentation minimal but accurate

Technology Choices

Prefer Simple Tech Stack

  • Use technologies the team already knows
  • Choose boring, stable technologies over cutting-edge ones
  • Minimize the number of different languages/frameworks
  • Use cloud services instead of building infrastructure
  • Leverage existing platforms and APIs when possible

Database/Storage

  • Start with the simplest storage that works (files, SQLite, etc.)
  • Use managed services over self-hosted solutions
  • Avoid complex database schemas initially
  • Don't worry about scalability until it's proven necessary

Agile Practices

Development Process

  • Work in short sprints (1-2 weeks max)
  • Daily standup focusing on blockers and progress
  • Demo working features frequently (even if rough)
  • Get user feedback early and often
  • Pivot quickly based on learnings

Code Reviews

  • Focus reviews on logic correctness, not style perfection
  • Approve if it works and is readable
  • Save refactoring discussions for later iterations
  • Prioritize knowledge sharing over nitpicking

Version Control

  • Commit frequently with clear, simple messages
  • Use feature branches for anything taking more than a day
  • Merge to main often (daily if possible)
  • Don't worry about perfect commit history
  • Use conventional commits if the team prefers structure

Anti-Patterns to Avoid

Over-Engineering

  • Don't build abstractions until you need them at least 3 times
  • Avoid creating your own frameworks or libraries
  • Don't implement features "because we might need them later"
  • Skip elaborate configuration systems
  • Don't optimize for theoretical future requirements

Analysis Paralysis

  • Set strict time boxes for technical decisions
  • Choose the first viable solution, not the perfect one
  • Don't research every possible option
  • Make reversible decisions quickly
  • Document decisions to avoid re-debating them

Perfectionism

  • Ship with known minor bugs if they don't block core functionality
  • Don't spend days on edge cases that affect <1% of users
  • Accept technical debt in non-critical areas
  • Focus on user value over code aesthetics
  • Remember: this is a PoC, not production software

Success Metrics for PoC

  • Time to first working demo
  • Number of core user stories completed
  • Stakeholder feedback quality and frequency
  • Team velocity and morale
  • Technical feasibility validation
  • Business hypothesis validation

Remember

The goal is to prove or disprove a concept quickly and cheaply. Every decision should be evaluated against this objective. Perfect code that takes too long is worse than imperfect code that proves the concept and enables learning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment