githubEdit

Replace mode : Delete and insert

How it works

REPLACE Mode follows a two-step process: delete existing data within a defined scope, then insert new data.

This ensures clean data for each synchronization by removing old values before writing new ones.

Technical implementation:

1

Delete (step 1)

Use a DELETE statement with a WHERE clause to remove data within the sync scope.

2

Insert (step 2)

Use an INSERT statement to add the new data.

The key distinction is the deletion scope, which differs based on table type.

Visual representation:

Step 1 - DELETE: Remove data matching the scope
Step 2 - INSERT: Add new data
Result: Only new data remains in the scope

Deletion scope by table type

The scope of deletion is critical to understand in REPLACE Mode:

Fact tables (metrics)

Scope: Only the reference date being synchronized

Example with a daily metrics table:

Dimension tables (attributes)

Scope: The entire table

Example with a campaign attributes table:


circle-check

Advantages


circle-exclamation

Disadvantages

Use cases

Ideal for:

📊 Fact tables with recalculated metrics

📊 Dimension tables requiring current snapshot only

📊 Aggregated data that may be reprocessed

📊 Data quality fixes

Not suitable for:

❌ Dimension tables requiring historical tracking

❌ High-frequency synchronizations on large tables

❌ Event logs or immutable transactions

❌ Compliance or audit requirements


Best practices

1

Understand your deletion scope clearly

  • Fact tables: only reference date partition affected

  • Dimension tables: entire table replaced

  • Document this behavior for data consumers

2

Combine with appropriate lookback windows

  • Use lookback to capture source platform corrections

  • Balance data accuracy vs. processing costs

  • Typical: 3-7 days for advertising platforms

3

Implement robust error handling

  • Use transactions when possible

  • Monitor for partial failures

  • Have rollback/recovery procedures ready

4

Consider partitioning strategy

  • Ensure fact tables are partitioned by _quanti_date

  • Verify partition pruning is working correctly

  • Monitor partition-level costs

5

Evaluate historical data needs

  • If history is needed for dimensions, use UPSERT instead

  • For fact tables, REPLACE is often appropriate

  • Document the trade-off decision

6

Monitor synchronization performance

  • Track DELETE + INSERT operation times

  • Alert on failures or slowdowns

  • Optimize if sync windows become problematic

7

Backup critical dimension tables

  • Before migration to REPLACE mode, backup historical data

  • Consider archiving snapshots periodically

  • Implement retention policies if needed


Example: REPLACE with lookback window

Scenario: Google Ads metrics with 7-day attribution window

Without REPLACE mode: You'd have duplicate rows for 2025-01-15 with different conversion values.

With REPLACE mode: Clean data with the most accurate values.


Comparison: Fact tables vs Dimension tables

Aspect
Fact Tables
Dimension Tables

Deletion scope

Reference date only

Entire table

Data preserved

Other dates untouched

Nothing preserved

Historical tracking

✅ Yes (by date)

❌ No

Typical size

Very large

Small to medium

Sync frequency

Daily

Daily to weekly

Performance impact

Low (partition-level)

Medium to High (full table)

Use case

Metrics with corrections

Current state snapshots

Last updated