githubEdit

Upsert mode : Update and insert

How it works

UPSERT Mode combines UPDATE and INSERT operations: existing rows (identified by their Primary Key) are updated, and new rows are inserted. No data is deleted.

This is Quanti's preferred insertion method as it provides the best balance between data preservation and accuracy.

1

Technical implementation

UPSERT Mode uses a MERGE statement (or equivalent) that:

  • Matches rows based on Primary Key fields (via _quanti_id)

  • Checks if data has actually changed (via _quanti_hash comparison)

  • Updates or inserts matched rows only if data changed

  • Inserts unmatched rows as new records

2

Change detection optimization

Quanti uses two technical fields to optimize operations:

  • _quanti_id: Composite identifier created by concatenating all Primary Key fields (fields with quantiId: true)

  • _quanti_hash: Hash of all table fields, used as a data fingerprint

When a row with matching _quanti_id is found, Quanti compares the _quanti_hash:

  • If different → data has changed → UPDATE.

  • If identical → data unchanged → SKIP (no operation performed)

This prevents unnecessary insertions when source data hasn't changed, significantly reducing storage costs and processing time.

Visual representation:

Existing data based on PK:
  Row A (PK=1): [old values] hash: abc123
  Row B (PK=2): [old values] hash: def456

New data:
  Row A (PK=1): [new values] hash: xyz789  ← PK matches, hash different → UPDATE
  Row B (PK=2): [old values] hash: def456  ← PK matches, hash identical → SKIP
  Row C (PK=3): [new values] hash: ghi012  ← PK new → INSERT

Result:
  Row A (PK=1): [new values] hash: xyz789  ← updated
  Row B (PK=2): [old values] hash: def456  ← unchanged (no operation)
  Row C (PK=3): [new values] hash: ghi012  ← inserted

Concrete example with a dimension table:

Impact example: If you sync 10,000 campaigns daily and only 500 change each day, Quanti will only perform operations on those 500 rows, skipping the 9,500 unchanged rows entirely.

triangle-exclamation

⚠️ Critical consideration: Performance impact on dimension tables


Critical characteristic: No deletion

⚠️ UPSERT Mode NEVER deletes rows from your table.

This is a fundamental difference from REPLACE Mode.

What happens when entities are deleted from source: the corresponding rows remain in your table (they are not removed by UPSERT). This preserves referential integrity and audit trails but can create orphaned or stale records over time.


UPSERT mode for historization

UPSERT Mode can be configured to preserve historical values rather than updating rows in place. This creates a new row for each change while keeping the old values.

Example of historization :

With historization, the _quanti_loaded_at field becomes crucial for identifying the most recent version of each entity.

Querying with historization:


When to historize

✅ Historize when:

  • Compliance or audit requirements mandate complete history

  • Business analysis requires tracking attribute changes over time

  • Dimension table is small (< 10,000 rows) and changes infrequently

  • Query volume on this table is low

❌ Avoid historization when:

  • Dimension table is large (> 100,000 rows) or changes frequently

  • High query volume on this table

  • Only current state matters for business needs

  • Storage and query costs are a concern

  • No specific requirement for historical tracking


How to Configure Historization

Historization in UPSERT Mode is controlled through Primary Key configuration during the connector setup (Mapping step). By selecting which fields form your table's unique identifier, you control which changes create new historical rows versus updating existing ones.

The Primary Key Principle

When configuring your table mapping, you select which fields are part of the Primary Key by checking their boxes in the Mapping step. This choice directly determines historization behavior.

Key concept:

  • Fields checked as Primary Key → Changes to these fields create new rows (historization)

  • Fields not checked → Changes to these fields update existing rows (no historization)

What is a Primary Key? The Primary Key is the combination of fields that uniquely identifies each row in your table. For example, for a campaigns table, campaign_id uniquely identifies each campaign. If you want to track how that campaign changes over time, you can add more fields to the Primary Key (like date or status), which will create a new row for each distinct combination.

Example 1: Standard setup (no historization)

Scenario: Keep only the current state of each campaign

Mapping configuration:

Primary Key selected: campaign_id only

Behavior:

Result: Only current state is kept. No history of the status change.


Example 2: Track changes over time with date

Scenario: Create a daily snapshot to track how campaign attributes evolve

Mapping configuration:

Primary Key selected: campaign_id + date

Behavior:

Result: Full history preserved. You can see the campaign's status on any date.


Example 3: Track only status changes

Scenario: Create a new row only when the status changes, but update other fields in place

Mapping configuration:

Primary Key selected: campaign_id + status

Behavior:

Result:

  • Status changes → new rows (tracked over time)

  • Name/budget changes → update existing row (not tracked)

triangle-exclamation

⚠️ Critical consideration: Performance impact on dimension tables


circle-check

Advantages


circle-exclamation

Disadvantages


Use cases

Ideal for:

  • Dimension tables (campaigns, ad groups, products, customers) — reflect updates while preserving referential integrity (avoid historization on large, frequently-changing dimension tables)

  • Master data and reference tables (account hierarchies, org structures)

  • Fact tables where updates are expected (order status, conversion tracking) — partitioning mitigates historization impact

  • Tables requiring compliance/audit trails (user permissions, config changes)

  • Mixed scenarios (new + updates + unchanged), e.g., product catalogs

Not suitable for:

  • Pure event streams or logs (events should be INSERT-only)

  • Tables requiring explicit deletion of entities (use REPLACE mode)

  • Large dimension tables with high change frequency + historization (consider REPLACE or snapshots)



Historization vs. No Historization

Aspect
With Historization
Without Historization

Update behavior

New row inserted

Existing row updated in place

Historical tracking

✅ Full history preserved

❌ Only current state

Table growth

📈 Grows with each change

📊 Stable (one row per entity)

Query complexity

⚠️ Need to filter latest version

✅ Direct access to current state

Storage cost

❌ Higher (multiple versions)

✅ Lower (single version)

Query cost (dimension tables)

❌❌ VERY HIGH (full table scans)

✅ Low

Audit capability

✅ Complete audit trail

❌ No audit trail

Temporal analysis

✅ "State at time X" queries

❌ Not possible

Performance impact

❌❌ Severe on dimensions

✅ Minimal

Use case

Small tables, compliance, rare queries

Most dimension tables


Last updated