Author: Engineering Team Date: 2026-02-04 Status: Proposal Type: Feature Enhancement
This proposal outlines how pgsyncd should handle schema evolution when source and target databases diverge. The guiding principle is: stop syncing when schema changes could cause data loss, continue when changes are safe.
Current behavior: pgsyncd assumes schemas match. When they don't, behavior is undefined and can lead to silent data loss or sync failures.
Proposed behavior: Validate schemas continuously, stop sync on divergence that risks data loss, continue with warnings when safe.
pgsyncd currently has no schema validation. It assumes:
- Source and target schemas are identical
- Column names and types match
- No columns are added or removed during sync
Real-world problems:
-
New column added to source
- Source:
users(id, name, email, phone) - Target:
users(id, name, email) - Current behavior: Undefined (likely INSERT fails or phone data is lost)
- Source:
-
Column type changed
- Source:
products.price NUMERIC(10,2) - Target:
products.price INTEGER - Current behavior: Undefined (likely silent data corruption: $19.99 → $20)
- Source:
-
Column removed from source
- Source:
users(id, name, email)(removedlegacy_id) - Target:
users(id, name, email, legacy_id) - Current behavior: Undefined
- Source:
- Silent data loss: Columns skipped without notice
- Data corruption: Lossy type casts without warning
- Production incidents: Sync fails unexpectedly when schemas diverge
- Difficult debugging: No visibility into schema mismatches
- Never silently lose data - If we can't sync a column safely, stop and alert
- Fail loudly, not silently - Schema problems should be impossible to miss
- Provide escape hatches - Allow overrides for emergency situations
- Resume-able - Schema issues shouldn't corrupt sync state
- Clear path to resolution - Error messages show exactly how to fix
For each schema divergence scenario:
Can we sync without data loss?
→ YES: Continue with warnings
→ NO: Stop and require explicit decision
→ MAYBE: Stop and provide options
Situation: Source has users.phone_number, target doesn't
Risk: DATA LOSS - source is collecting data we can't store
Behavior: STOP SYNC
Rationale:
- If we skip the column, we lose data that exists in source
- When target is updated later, we'd need a backfill
- Better to stop immediately and force schema alignment
Error Message:
[ERROR] Schema mismatch detected on public.users
Source has new column: 'phone_number' (type: varchar)
Target does not have this column
Sync STOPPED to prevent data loss.
Actions:
1. Add column to target (RECOMMENDED):
ALTER TABLE target.users ADD COLUMN phone_number varchar;
2. Or explicitly skip this column (NOT RECOMMENDED - data loss):
pgsyncd sync --skip-columns=users.phone_number
3. Or allow all source-only columns (DANGEROUS):
pgsyncd sync --allow-schema-drift=source_superset
Status output:
{
"table": "public.users",
"status": "schema_mismatch",
"error": "Source has new column 'phone_number' not in target",
"sync_stopped_at": "2026-02-04T20:15:00Z",
"resolution": "Add column to target or use --skip-columns"
}Situation: Target has users.created_by_system, source doesn't
Risk: Depends on target column constraints
Behavior:
- ✅ CONTINUE if target column is nullable or has DEFAULT
- 🛑 STOP if target column is NOT NULL with no DEFAULT
Rationale:
- No source data being lost
- Target column for audit/metadata is common pattern
- Safe if target handles missing values
Safe case (continue):
-- Target column nullable → INSERT will use NULL
ALTER TABLE users ADD COLUMN created_by_system TEXT;
-- Target column has DEFAULT → INSERT will use default
ALTER TABLE users ADD COLUMN created_by_system TEXT DEFAULT 'pgsyncd';Warning message:
[WARN] Schema difference on public.users
Target has extra column: 'created_by_system' (nullable)
This column will be NULL for all synced rows
This is OK for audit/metadata columns
Unsafe case (stop):
-- NOT NULL with no DEFAULT → INSERT will fail
ALTER TABLE users ADD COLUMN created_by_system TEXT NOT NULL;Error message:
[ERROR] Schema mismatch on public.users
Target has new column 'created_by_system' (NOT NULL, no DEFAULT)
Source does not have this column
Cannot insert rows - column requires a value.
Fix:
ALTER TABLE target.users ALTER COLUMN created_by_system DROP NOT NULL;
-- OR --
ALTER TABLE target.users ALTER COLUMN created_by_system SET DEFAULT 'pgsyncd';
Situation: Source has price NUMERIC(10,2), target has price INTEGER
Risk: DATA CORRUPTION - casting loses precision
Behavior: STOP SYNC
Rationale:
- Casting
$19.99 → $20silently corrupts data - Silent corruption worse than stopping
- Operator must explicitly allow precision loss
Error message:
[ERROR] Type mismatch on public.products.price
Source: numeric(10,2)
Target: integer
Casting would lose precision (19.99 → 20)
Fix:
1. Match types (RECOMMENDED):
ALTER TABLE target.products ALTER COLUMN price TYPE numeric(10,2);
2. Or explicitly allow cast (DATA LOSS):
pgsyncd sync --allow-type-cast=products.price:integer
3. Or allow all casts (DANGEROUS):
pgsyncd sync --allow-schema-drift=type_casts --force
Situation: Source has id INTEGER, target has id BIGINT
Risk: NONE - safe widening cast
Behavior: CONTINUE
Rationale:
- Safe casts:
int→bigint,varchar(10)→varchar(20),date→timestamp - No data loss, no precision loss
- Common pattern for capacity planning
Safe casts list:
smallint→integer→bigintinteger→numericvarchar(N)→varchar(M)where M > Nvarchar(N)→textdate→timestamptimestamp→timestamptz
Info message:
[INFO] Type difference on public.users.id
Source: integer
Target: bigint
Safe widening cast - continuing
Situation: Source removed users.legacy_id, target still has it
Risk: AMBIGUOUS - what should happen to target column?
Behavior: STOP SYNC
Rationale:
- Multiple valid approaches:
- Keep target column (preserve historical data)
- Set to NULL (mark as removed)
- Remove from target (match source)
- Operator must choose explicitly
Error message:
[ERROR] Schema mismatch on public.users
Target has column 'legacy_id' that no longer exists in source
Decide what to do with target column:
1. Remove from target:
ALTER TABLE target.users DROP COLUMN legacy_id;
2. Keep existing values, stop updating:
pgsyncd sync --preserve-columns=users.legacy_id
3. Set to NULL on all synced rows:
pgsyncd sync --null-removed-columns=users.legacy_id
Situation: Source has (id, name, email), target has (id, email, name)
Risk: NONE - if we use explicit column names
Behavior: CONTINUE (always use explicit column names)
Rationale:
- Never rely on column order
- Always generate
INSERT INTO table (col1, col2) VALUES ($1, $2) - This should "just work"
Implementation:
-- Always use explicit column names:
INSERT INTO target.users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
-- Never use positional:
INSERT INTO target.users VALUES (1, 'Alice', 'alice@example.com') -- ❌ WRONGEach table tracks a schema fingerprint containing:
- Column names (in order)
- Column types (with precision/scale)
- NOT NULL constraints
- DEFAULT expressions
- Primary key definition
typedef struct SchemaFingerprint {
char *tableName;
int columnCount;
SchemaColumn *columns; // Array of column definitions
char *primaryKey;
char *fingerprintHash; // SHA256 of canonical schema
} SchemaFingerprint;
typedef struct SchemaColumn {
char *name;
char *dataType;
bool notNull;
char *defaultExpr;
int position;
} SchemaColumn;pgsyncd setup --source <uri> --target <uri>
# For each table:
1. Query source schema: SELECT * FROM information_schema.columns WHERE table_name = ...
2. Query target schema: SELECT * FROM information_schema.columns WHERE table_name = ...
3. Compare schemas:
- Identify common columns
- Identify source-only columns (data loss risk!)
- Identify target-only columns (check NOT NULL)
- Identify type mismatches (check if safe cast)
4. Validate compatibility:
- If source has extra columns → FAIL
- If target has NOT NULL column without DEFAULT → FAIL
- If lossy type cast required → FAIL
- If only warnings → PROCEED
5. Store schema fingerprint in catalog:
- Common columns to sync
- Safe casts to apply
- Warnings issuedCatalog schema:
CREATE TABLE schema_fingerprints (
table_oid INTEGER PRIMARY KEY,
table_name TEXT NOT NULL,
source_fingerprint TEXT NOT NULL, -- JSON schema from source
target_fingerprint TEXT NOT NULL, -- JSON schema from target
common_columns TEXT[], -- Columns safe to sync
compatibility_status TEXT, -- 'compatible', 'warning', 'error'
validation_warnings TEXT[],
last_validated TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);// Before syncing each table, validate schema hasn't changed
bool validate_schema_before_sync(SyncdTable *table) {
// Get current schema from both databases
SchemaFingerprint *source_current = get_current_source_schema(table);
SchemaFingerprint *target_current = get_current_target_schema(table);
// Get stored fingerprints from setup
SchemaFingerprint *source_stored = table->sourceSchemaFingerprint;
SchemaFingerprint *target_stored = table->targetSchemaFingerprint;
// Check if schemas changed
if (!fingerprints_match(source_current, source_stored)) {
log_error("Source schema changed for %s.%s since setup",
table->nspname, table->relname);
SchemaDiff diff = compute_diff(source_stored, source_current);
if (diff.has_new_columns) {
log_error(" New columns in source: %s", join(diff.new_columns));
log_error(" Sync STOPPED to prevent data loss");
log_error(" Run: pgsyncd schema diff --table %s", table->relname);
return false;
}
if (diff.has_removed_columns) {
log_error(" Removed columns from source: %s", join(diff.removed_columns));
log_error(" Sync STOPPED - ambiguous what to do with target columns");
return false;
}
if (diff.has_type_changes) {
log_error(" Type changes: %s", format_type_changes(diff.type_changes));
log_error(" Sync STOPPED to prevent data corruption");
return false;
}
}
if (!fingerprints_match(target_current, target_stored)) {
log_warn("Target schema changed for %s.%s since setup",
table->nspname, table->relname);
// Re-validate if still compatible
if (!validate_target_changes_safe(table, target_current)) {
return false;
}
}
return true;
}// Use only common columns when generating SQL
void build_insert_sql(SyncdTable *table, char *sql_buf, size_t len) {
// Get list of columns safe to sync (from schema fingerprint)
char **columns = table->common_columns;
int column_count = table->common_column_count;
// Build column list: "id, name, email, created_at"
char column_list[1024];
join_columns(columns, column_count, ", ", column_list, sizeof(column_list));
// Build placeholder list: "$1, $2, $3, $4"
char placeholders[1024];
build_placeholders(column_count, placeholders, sizeof(placeholders));
// Generate INSERT with explicit column names
snprintf(sql_buf, len,
"INSERT INTO %s.%s (%s) VALUES (%s) ON CONFLICT (%s) DO UPDATE SET %s",
table->target_nspname,
table->target_relname,
column_list,
placeholders,
table->primary_key,
build_update_set_clause(columns, column_count));
}# Validate schemas without starting sync
pgsyncd schema validate \
--source "postgresql://source/db" \
--target "postgresql://target/db"
# Output:
╔════════════════════════════════════════════════════════════════╗
║ Schema Validation Report ║
╠════════════════════════════════════════════════════════════════╣
║ public.users ✓ OK ║
║ 15 source columns, 14 target columns ║
║ ⚠ Source has 'phone' (not in target) - WILL STOP SYNC ║
║ ║
║ public.orders ✓ OK ║
║ All 12 columns match perfectly ║
║ ║
║ public.products ❌ ERROR ║
║ Type mismatch: price (numeric vs integer) ║
║ Action: Fix target schema or use --allow-type-cast ║
╚════════════════════════════════════════════════════════════════╝
Summary:
✓ 1 table fully compatible
⚠ 1 table compatible with warnings
❌ 1 table INCOMPATIBLE
Recommendation: Fix issues before running sync# Show detailed schema differences
pgsyncd schema diff \
--source "postgresql://source/db" \
--target "postgresql://target/db" \
--table users
# Output:
Table: public.users
Columns in common: 13
✓ id (integer) - types match
✓ name (varchar) - types match
✓ email (varchar) - types match
...
Source only: 2 columns
+ phone varchar(20) ⚠ Data loss risk
+ ssn varchar(11) ⚠ Data loss risk
Target only: 1 column
+ created_by_system text DEFAULT 'pgsyncd' ✓ Safe (has default)
Type mismatches: 0
Sync status: INCOMPATIBLE
Reason: Source has columns not in target (data loss risk)
Fix:
ALTER TABLE target.users ADD COLUMN phone varchar(20);
ALTER TABLE target.users ADD COLUMN ssn varchar(11);# Generate SQL to align target with source
pgsyncd schema sync \
--source "postgresql://source/db" \
--target "postgresql://target/db" \
--dry-run
# Output:
[DRY RUN] Would execute on target:
-- Table: public.users
ALTER TABLE public.users ADD COLUMN phone varchar(20);
ALTER TABLE public.users ADD COLUMN ssn varchar(11);
-- Table: public.products
ALTER TABLE public.products ALTER COLUMN price TYPE numeric(10,2);
-- Apply with:
pgsyncd schema sync --source ... --target ... --apply# Actually apply migrations
pgsyncd schema sync \
--source "postgresql://source/db" \
--target "postgresql://target/db" \
--apply
# Output:
[INFO] Aligning target schema with source...
[INFO] Adding column target.users.phone
[INFO] Adding column target.users.ssn
[INFO] Altering column target.products.price
[SUCCESS] Target schema now matches source
[SUCCESS] Run 'pgsyncd setup' to refresh schema fingerprints# Allow specific schema drift types
--allow-schema-drift=<type>
Types:
- source_superset: Allow source to have extra columns (SKIP them - data loss!)
- target_superset: Allow target to have extra columns (use NULL/DEFAULT)
- type_casts: Allow lossy type casts (precision loss!)
- all: Allow any drift (DANGEROUS - requires --force)
# Granular column control
--skip-columns=<table.column>,<table.column>
Example: --skip-columns=users.ssn,users.phone
--preserve-columns=<table.column>,<table.column>
Keep target columns that were removed from source
--null-removed-columns=<table.column>,<table.column>
Set target columns to NULL when removed from source
# Type cast control
--allow-type-cast=<table.column>:<target_type>
Example: --allow-type-cast=products.price:integer
--allow-all-type-casts
Allow all type casts, even lossy ones (DANGEROUS)
# Safety override
--force
Required with dangerous flags like --allow-schema-drift=all
Prevents accidental data loss# pgsyncd.yaml
schema:
# Validation strictness
validation: strict # strict | permissive | disabled
# Validation frequency
validation_interval: 60s # How often to re-check schemas during sync
# Column handling
source_added_columns: stop # stop | skip (data loss!)
target_added_columns: allow # allow | stop
source_removed_columns: stop # stop | preserve | null
# Type handling
type_mismatches: stop # stop | cast_if_safe | cast_always
safe_casts_allowed: true # Allow int→bigint, varchar(10)→varchar(20), etc
# Per-table overrides
table_overrides:
- table: public.audit_logs
skip_columns: [user_ip, user_agent] # Don't sync PII to reporting DB
- table: public.products
allow_type_cast:
price: integer # Explicitly allow lossy cast for this table# Schema validation status per table
pgsyncd_schema_status{namespace, deployment, table, schema}
1 = schemas match
0 = schema mismatch blocking sync
# Schema mismatches by type
pgsyncd_schema_mismatches_total{namespace, deployment, table, type}
type: source_added_column, target_added_column, type_mismatch,
source_removed_column, safe_cast, unsafe_cast
# Tables blocked by schema issues
pgsyncd_tables_blocked_schema_total{namespace, deployment, reason}
reason: data_loss_risk, type_mismatch, target_not_null
# Rows skipped due to schema issues
pgsyncd_rows_skipped_schema_total{namespace, deployment, table, reason}
reason: column_mismatch, cast_error, constraint_violation
# Schema validations performed
pgsyncd_schema_validations_total{namespace, deployment, result}
result: pass, fail_safe, fail_unsafe, warning
Panel: Schema Health Status
# Table showing schema status for all tables
pgsyncd_schema_status{namespace="$namespace", deployment="$deployment"}
Panel: Schema Mismatch Alerts
# Count of tables with schema issues
count(pgsyncd_schema_status{namespace="$namespace", deployment="$deployment"} == 0)
Alert: Schema Blocking Sync
- alert: PgsyncSchemaBlocked
expr: pgsyncd_schema_status == 0
for: 5m
annotations:
summary: "pgsyncd sync blocked by schema mismatch on {{ $labels.table }}"
description: |
Table {{ $labels.table }} has schema mismatch blocking sync.
Check: pgsyncd schema diff --table {{ $labels.table }}
Dashboard: {{ $grafana_url }}/d/pgsyncd
labels:
severity: criticalWeek 1-2: Add validation, don't block
# Validate schemas but only warn, don't stop sync
pgsyncd sync --schema-validation=warn-only
# Logs schema mismatches but continues syncing
[WARN] Schema mismatch on public.users (source has 'phone', target doesn't)
[WARN] This will become a blocking error in version X.Y
[WARN] Continuing sync for now (--schema-validation=warn-only)Metrics collected:
- How many tables have schema mismatches in production?
- What types of mismatches are most common?
- Are any mismatches causing data loss?
Week 3+: Enable blocking
# Default behavior: stop on schema mismatch
pgsyncd sync # --schema-validation=strict (default)
# Output:
[ERROR] Schema mismatch on public.users
[ERROR] Source has new column 'phone' not in target
[ERROR] Sync STOPPED to prevent data lossFuture enhancement:
# Automatically apply schema changes to target
pgsyncd sync --auto-migrate-schema
# Or in config:
schema:
auto_migrate: true
auto_migrate_types: [add_column, safe_type_change]
require_approval: true # Require operator confirmation╔════════════════════════════════════════════════════════════════╗
║ SYNC STOPPED: Schema Mismatch ║
╠════════════════════════════════════════════════════════════════╣
║ Table: public.users ║
║ ║
║ Problem: ║
║ Source database has new column 'phone_number' (varchar) ║
║ Target database does not have this column ║
║ ║
║ Impact: ║
║ If sync continues, phone_number data will be LOST ║
║ Users inserted/updated since schema change: ~5,000 ║
║ ║
║ Fix (RECOMMENDED): ║
║ Add column to target database: ║
║ ║
║ ALTER TABLE target.public.users ║
║ ADD COLUMN phone_number varchar; ║
║ ║
║ Alternative (NOT RECOMMENDED - DATA LOSS): ║
║ Skip this column: ║
║ ║
║ pgsyncd sync --skip-columns=users.phone_number ║
║ ║
║ More info: ║
║ pgsyncd schema diff --table users ║
╚════════════════════════════════════════════════════════════════╝
╔════════════════════════════════════════════════════════════════╗
║ SYNC STOPPED: Type Mismatch ║
╠════════════════════════════════════════════════════════════════╣
║ Table: public.products ║
║ Column: price ║
║ ║
║ Problem: ║
║ Source: numeric(10,2) - e.g., $19.99 ║
║ Target: integer - e.g., $19 ║
║ ║
║ Impact: ║
║ Casting would lose cents: $19.99 → $19 ║
║ This is DATA CORRUPTION ║
║ ║
║ Fix (RECOMMENDED): ║
║ Match target type to source: ║
║ ║
║ ALTER TABLE target.public.products ║
║ ALTER COLUMN price TYPE numeric(10,2); ║
║ ║
║ Alternative (DATA LOSS): ║
║ Explicitly allow precision loss: ║
║ ║
║ pgsyncd sync --allow-type-cast=products.price:integer ║
║ ║
╚════════════════════════════════════════════════════════════════╝
// tests/syncd/test_schema_validation.c
void test_detect_source_added_column() {
// Setup: Create source with extra column
SchemaFingerprint *source = create_schema(
"public", "users",
(SchemaColumn[]){
{"id", "integer", false, NULL},
{"name", "varchar", false, NULL},
{"phone", "varchar", false, NULL} // Extra column
}, 3
);
SchemaFingerprint *target = create_schema(
"public", "users",
(SchemaColumn[]){
{"id", "integer", false, NULL},
{"name", "varchar", false, NULL}
}, 2
);
// Test
SchemaDiff diff = compute_schema_diff(source, target);
// Assert
assert(diff.has_data_loss_risk == true);
assert(diff.source_added_count == 1);
assert(strcmp(diff.source_added[0], "phone") == 0);
}
void test_allow_safe_type_widening() {
SchemaFingerprint *source = create_schema_with_type("id", "integer");
SchemaFingerprint *target = create_schema_with_type("id", "bigint");
SchemaDiff diff = compute_schema_diff(source, target);
assert(diff.has_type_mismatches == true);
assert(diff.has_unsafe_casts == false); // int→bigint is safe
assert(diff.has_data_loss_risk == false);
}
void test_detect_lossy_type_cast() {
SchemaFingerprint *source = create_schema_with_type("price", "numeric");
SchemaFingerprint *target = create_schema_with_type("price", "integer");
SchemaDiff diff = compute_schema_diff(source, target);
assert(diff.has_unsafe_casts == true);
assert(diff.has_data_loss_risk == true);
}#!/usr/bin/env bash
# tests/syncd/test_schema_evolution.sh
test_source_added_column_stops_sync() {
# Setup: Create source with extra column
psql $SOURCE -c "ALTER TABLE users ADD COLUMN phone varchar"
# Test: Sync should fail
if pgsyncd sync --source $SOURCE --target $TARGET --dir $WORKDIR; then
error "Sync should have failed on schema mismatch"
return 1
fi
# Verify error message
if ! grep -q "Source has new column 'phone'" sync.log; then
error "Expected schema mismatch error"
return 1
fi
# Fix schema
psql $TARGET -c "ALTER TABLE users ADD COLUMN phone varchar"
# Sync should now succeed
pgsyncd sync --source $SOURCE --target $TARGET --dir $WORKDIR
}
test_target_added_nullable_column_continues() {
# Setup: Add nullable column to target
psql $TARGET -c "ALTER TABLE users ADD COLUMN metadata text"
# Test: Sync should succeed with warning
pgsyncd sync --source $SOURCE --target $TARGET --dir $WORKDIR 2>&1 | tee sync.log
# Verify warning logged
grep -q "Target has extra column 'metadata'" sync.log
# Verify data synced
assert_row_count_matches
}
test_escape_hatch_skip_columns() {
# Setup: Source has extra column
psql $SOURCE -c "ALTER TABLE users ADD COLUMN ssn varchar"
# Test: Skip column explicitly
pgsyncd sync \
--source $SOURCE \
--target $TARGET \
--dir $WORKDIR \
--skip-columns=users.ssn
# Verify sync succeeded
assert_row_count_matches
# Verify SSN data not in target
assert_column_not_exists "users" "ssn" $TARGET
}-
Performance impact of schema validation?
- How often to re-validate during long-running sync?
- Cache fingerprints? For how long?
- Query
information_schemaon every loop iteration?
-
Backward compatibility?
- Should this be opt-in initially? (
--schema-validation=strict) - Or opt-out for existing deployments? (
--schema-validation=disabled)
- Should this be opt-in initially? (
-
Schema sync automation?
- Should pgsyncd automatically apply safe schema changes to target?
- Or always require manual approval?
- How to handle production safety?
-
Partial table sync?
- If 10 tables are syncing and 1 has schema issue, should:
- Stop all 10 tables?
- Continue 9, stop 1?
- Current proposal: Continue healthy tables, stop problematic one
- If 10 tables are syncing and 1 has schema issue, should:
-
Schema change detection latency?
- How quickly should we detect schema changes?
- Re-validate every loop iteration (slow)?
- Re-validate every N minutes?
- Re-validate on error only?
- Schema-related incidents per month: ~5-10
- Time to detect schema issue: 2-24 hours (manual discovery)
- Data loss incidents: ~1-2 per quarter
- Mean time to recovery: 2-4 hours (find issue, fix schema, backfill)
- Schema-related incidents per month: 0-1 (only if validation disabled)
- Time to detect schema issue: < 1 minute (automatic detection)
- Data loss incidents: 0 (sync stops before data loss)
- Mean time to recovery: < 30 minutes (clear error message, fix schema, resume)
# Incidents caused by schema issues
rate(pgsyncd_schema_status{status="error"}[7d])
# Time to detect
pgsyncd_schema_validation_duration_seconds
# Tables blocked
pgsyncd_tables_blocked_schema_total
Approach: Sync whatever columns exist in both, ignore differences
Pros:
- Never blocks sync
- Simpler implementation
Cons:
- Silent data loss (worst case scenario)
- No visibility into problems
- Difficult to debug later
Rejected: Violates "never silently lose data" principle
Approach: Require explicit schema version in catalog, fail on any change
Pros:
- Maximum safety
- Explicit schema contracts
Cons:
- Too strict for normal operations
- Every schema change requires manual intervention
- Poor developer experience
Rejected: Too rigid for real-world use
Approach: Only sync columns explicitly listed in config
Pros:
- Explicit about what to sync
- Schema changes don't break sync
Cons:
- High maintenance (list every column)
- Error-prone (forget to add new columns)
- Doesn't solve type mismatch problem
Rejected: Too much manual work, error-prone
This proposal provides a fail-safe approach to schema evolution:
✅ Stops sync when data loss risk exists ✅ Continues when changes are safe ✅ Clear error messages with resolution steps ✅ Escape hatches for emergencies ✅ Observable via metrics
Key insight: When schemas diverge, the question isn't "can we technically continue?" The question is "will continuing cause data loss or corruption?" If yes → STOP. If no → CONTINUE.
This approach never silently loses data, makes problems visible immediately, and provides clear paths to resolution.
SELECT
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale,
is_nullable,
column_default,
ordinal_position
FROM information_schema.columns
WHERE table_schema = $1
AND table_name = $2
ORDER BY ordinal_position;-- Same as source query-- Columns in source but not target (data loss risk)
SELECT s.column_name, s.data_type
FROM source_schema s
LEFT JOIN target_schema t ON s.column_name = t.column_name
WHERE t.column_name IS NULL;
-- Columns in target but not source
SELECT t.column_name, t.data_type, t.is_nullable, t.column_default
FROM target_schema t
LEFT JOIN source_schema s ON t.column_name = s.column_name
WHERE s.column_name IS NULL;
-- Type mismatches
SELECT
s.column_name,
s.data_type AS source_type,
t.data_type AS target_type
FROM source_schema s
JOIN target_schema t ON s.column_name = t.column_name
WHERE s.data_type != t.data_type
OR s.character_maximum_length != t.character_maximum_length
OR s.numeric_precision != t.numeric_precision;