Data insertion strategies

Introduction

When synchronizing data from your source platforms to your data warehouse, a fundamental question arises: how should new data interact with existing data?

The answer lies in choosing the appropriate insertion strategy. This choice determines whether data is added, replaced, or updated, and has direct consequences on data quality, storage costs, and analytical capabilities.

The three insertion strategies Quanti uses:

Append new data without deleting anything.

Use case: when you want to keep a full audit trail of every record as ingested, and duplicates are acceptable or deduplicated later.

Advantages: simple, append-only; minimal risk of accidental data loss.

Limitations: can lead to duplicate rows, larger storage needs, and more complex downstream deduplication logic.

Quanti automatically selects the most appropriate insertion method for each table in its connectors. The preferred method is UPSERT Mode, which offers the best balance between data preservation and updates. However, depending on the nature of the data and business requirements, INSERT or REPLACE methods may be more suitable.

⚠️ Critical consideration: Performance impact on dimension tables

Dimension tables are typically NOT partitioned (unlike fact tables). This means historized dimension tables require scanning the ENTIRE table to find the latest version of each entity, causing large increases in query cost, storage, and degraded performance.

Example consequences:

Table size growth: 10,000 rows → 500,000 rows (50 changes/campaign/year)
Query scans: 10,000 rows → 500,000 rows (50x more expensive)
Monthly cost example: ~$5 → ~$250 (50x increase)

Why this doesn't affect fact tables:

Fact tables are partitioned by date (_quanti_date) and queries typically include date filters so partition pruning reduces scanned data.

Why this matters

Understanding insertion strategies is crucial to:

Comprehend the behavior of your data over time
Anticipate how updates and corrections are handled
Identify potential risks (duplicates, data loss, performance issues)
Make informed decisions when configuring custom connectors
Troubleshoot data inconsistencies effectively

What you'll learn

This section provides a detailed explanation of each insertion method, including:

How each method works technically
When to use each method
Advantages and limitations
Concrete examples with fact and dimension tables
Impact on data warehouse performance and costs

PreviousLookback window NextInsert mode : Insert without deleting

Last updated 1 day ago

hashtagIntroduction

hashtag⚠️ Critical consideration: Performance impact on dimension tables

hashtagWhy this matters

hashtagWhat you'll learn

Introduction

⚠️ Critical consideration: Performance impact on dimension tables

Why this matters

What you'll learn