githubEdit

Insert mode : Insert without deleting

How it works

INSERT Mode follows a simple principle: new data is appended to existing data without any deletion.

Each synchronization adds rows to the table, regardless of whether similar or identical data already exists.

Technical implementation:

INSERT Mode uses straightforward INSERT SQL statements without any WHERE clause. This means:

  • No need to scan existing data to check for duplicates

  • No complex filtering or comparison operations

  • Direct insertion into the table

Visual representation:

Existing data: [A, B, C]
New data:      [D, E]
Result:        [A, B, C, D, E]

Concrete example with a fact table:

Initial state (campaign_stats table):
| date       | campaign_id | impressions | clicks | _quanti_loaded_at   |
|------------|-------------|-------------|--------|---------------------|
| 2025-01-15 | camp_123    | 1000        | 50     | 2025-01-16 08:00:00 |
| 2025-01-15 | camp_456    | 800         | 30     | 2025-01-16 08:00:00 |

After synchronization (same reference date):
| date       | campaign_id | impressions | clicks | _quanti_loaded_at   |
|------------|-------------|-------------|--------|---------------------|
| 2025-01-15 | camp_123    | 1000        | 50     | 2025-01-16 08:00:00 |
| 2025-01-15 | camp_456    | 800         | 30     | 2025-01-16 08:00:00 |
| 2025-01-15 | camp_123    | 1050        | 52     | 2025-01-17 09:00:00 | ← new row
| 2025-01-15 | camp_456    | 820         | 31     | 2025-01-17 09:00:00 | ← new row

Key characteristics:

  • No deletion operations

  • No updates to existing rows

  • Each synchronization adds new rows

  • Table grows continuously


circle-check

Advantages

circle-exclamation

Disadvantages


Use cases

Ideal for:

  • Event logs and activity streams

  • Transaction records

  • Timestamped metrics with high granularity

  • Raw data landing zones

Not suitable for:

  • Dimension tables (attributes)

  • Daily aggregated metrics that may be recalculated

  • Master data or reference tables


Best practices

1

Leverage _quanti_loaded_at for deduplication

  • Use _quanti_loaded_at systematically in transformation queries to identify the latest version.

  • This field is the key to recreating uniqueness.

  • Document the deduplication logic for data consumers.

2

Implement deduplication in transformation layer

  • Create deduplicated views or tables downstream.

  • Use ROW_NUMBER() with PARTITION BY on primary key fields.

  • Order by _quanti_loaded_at DESC to get the most recent version.

3

Ensure true immutability

  • Only use INSERT Mode for data that genuinely never changes once created.

  • Verify that your source platform doesn't update historical records.

  • If updates occur, INSERT mode will create duplicates without clear resolution.

4

Monitor table growth

  • Set up alerts for unexpected table size increases.

  • Implement data retention policies if needed.

  • Consider archiving or partitioning old data.

  • Balance storage costs against processing cost savings.

5

Document the choice

  • Clearly communicate to data consumers why INSERT Mode is used.

  • Provide example queries showing how to handle duplicates.

  • Explain the data model and deduplication strategy.


Example: Recreating uniqueness with _quanti_loaded_at

The _quanti_loaded_at field allows you to reconstruct table uniqueness before transformations.

Get the most recent version of each record:

Create a deduplicated view:

Aggregate unique events only:


Cost optimization summary

INSERT Mode = Lower processing costs + Higher storage costs

Aspect
Impact
Explanation

Insertion

✅ Cheaper

No WHERE clause = no data scanning

Storage

❌ More expensive

Table grows with every sync

Queries

⚠️ More expensive

Need to scan more rows and deduplicate

Overall

✅ Net positive

Storage is cheaper than compute

Best use case: Raw data ingestion layers where deduplication is handled downstream in transformation pipelines.