Upsert mode : Update and insert
How it works
UPSERT Mode combines UPDATE and INSERT operations: existing rows (identified by their Primary Key) are updated, and new rows are inserted. No data is deleted.
This is Quanti's preferred insertion method as it provides the best balance between data preservation and accuracy.
Change detection optimization
Quanti uses two technical fields to optimize operations:
_quanti_id: Composite identifier created by concatenating all Primary Key fields (fields withquantiId: true)_quanti_hash: Hash of all table fields, used as a data fingerprint
When a row with matching _quanti_id is found, Quanti compares the _quanti_hash:
If different → data has changed → UPDATE.
If identical → data unchanged → SKIP (no operation performed)
This prevents unnecessary insertions when source data hasn't changed, significantly reducing storage costs and processing time.
Visual representation:
Existing data based on PK:
Row A (PK=1): [old values] hash: abc123
Row B (PK=2): [old values] hash: def456
New data:
Row A (PK=1): [new values] hash: xyz789 ← PK matches, hash different → UPDATE
Row B (PK=2): [old values] hash: def456 ← PK matches, hash identical → SKIP
Row C (PK=3): [new values] hash: ghi012 ← PK new → INSERT
Result:
Row A (PK=1): [new values] hash: xyz789 ← updated
Row B (PK=2): [old values] hash: def456 ← unchanged (no operation)
Row C (PK=3): [new values] hash: ghi012 ← insertedConcrete example with a dimension table:
Impact example: If you sync 10,000 campaigns daily and only 500 change each day, Quanti will only perform operations on those 500 rows, skipping the 9,500 unchanged rows entirely.
⚠️ Critical consideration: Performance impact on dimension tables
Dimension tables are typically NOT partitioned (unlike fact tables). This means historized dimension tables require scanning the ENTIRE table to find the latest version of each entity, causing large increases in query cost, storage, and degraded performance.
Example consequences:
Table size growth: 10,000 rows → 500,000 rows (50 changes/campaign/year)
Query scans: 10,000 rows → 500,000 rows (50x more expensive)
Monthly cost example: ~$5 → ~$250 (50x increase)
Why this doesn't affect fact tables:
Fact tables are partitioned by date (
_quanti_date) and queries typically include date filters so partition pruning reduces scanned data.
Critical characteristic: No deletion
⚠️ UPSERT Mode NEVER deletes rows from your table.
This is a fundamental difference from REPLACE Mode.
What happens when entities are deleted from source: the corresponding rows remain in your table (they are not removed by UPSERT). This preserves referential integrity and audit trails but can create orphaned or stale records over time.
UPSERT mode for historization
UPSERT Mode can be configured to preserve historical values rather than updating rows in place. This creates a new row for each change while keeping the old values.
Example of historization :
With historization, the _quanti_loaded_at field becomes crucial for identifying the most recent version of each entity.
Querying with historization:
When to historize
✅ Historize when:
Compliance or audit requirements mandate complete history
Business analysis requires tracking attribute changes over time
Dimension table is small (< 10,000 rows) and changes infrequently
Query volume on this table is low
❌ Avoid historization when:
Dimension table is large (> 100,000 rows) or changes frequently
High query volume on this table
Only current state matters for business needs
Storage and query costs are a concern
No specific requirement for historical tracking
How to Configure Historization
Historization in UPSERT Mode is controlled through Primary Key configuration during the connector setup (Mapping step). By selecting which fields form your table's unique identifier, you control which changes create new historical rows versus updating existing ones.
The Primary Key Principle
When configuring your table mapping, you select which fields are part of the Primary Key by checking their boxes in the Mapping step. This choice directly determines historization behavior.
Key concept:
Fields checked as Primary Key → Changes to these fields create new rows (historization)
Fields not checked → Changes to these fields update existing rows (no historization)
What is a Primary Key? The Primary Key is the combination of fields that uniquely identifies each row in your table. For example, for a campaigns table, campaign_id uniquely identifies each campaign. If you want to track how that campaign changes over time, you can add more fields to the Primary Key (like date or status), which will create a new row for each distinct combination.
Example 1: Standard setup (no historization)
Scenario: Keep only the current state of each campaign
Mapping configuration:
Primary Key selected: campaign_id only
Behavior:
Result: Only current state is kept. No history of the status change.
Example 2: Track changes over time with date
Scenario: Create a daily snapshot to track how campaign attributes evolve
Mapping configuration:
Primary Key selected: campaign_id + date
Behavior:
Result: Full history preserved. You can see the campaign's status on any date.
Example 3: Track only status changes
Scenario: Create a new row only when the status changes, but update other fields in place
Mapping configuration:
Primary Key selected: campaign_id + status
Behavior:
Result:
Status changes → new rows (tracked over time)
Name/budget changes → update existing row (not tracked)
⚠️ Critical consideration: Performance impact on dimension tables
Dimension tables are typically NOT partitioned (unlike fact tables). This means historized dimension tables require scanning the ENTIRE table to find the latest version of each entity, causing large increases in query cost, storage, and degraded performance.
Example consequences:
Table size growth: 10,000 rows → 500,000 rows (50 changes/campaign/year)
Query scans: 10,000 rows → 500,000 rows (50x more expensive)
Monthly cost example: ~$5 → ~$250 (50x increase)
Why this doesn't affect fact tables:
Fact tables are partitioned by date (
_quanti_date) and queries typically include date filters so partition pruning reduces scanned data.
Advantages
✅ No data loss — Existing rows that don't appear in new data are preserved
✅ Handles updates gracefully — Changes to existing entities are reflected immediately, no duplicates
✅ Supports historization — Can track attribute changes over time (with performance considerations)
✅ Flexible and robust — Works for both fact and dimension tables; resilient to partial source data
✅ Optimized with hash comparison — Skips unchanged rows entirely (_quanti_hash comparison)
Disadvantages
❌ Moderate processing costs —
MERGEoperations require PK matching and scanning❌ Table growth with historization — Storage costs increase with each change
❌ Complex queries with historization — Need to filter latest version using
_quanti_loaded_at❌ Primary Key dependency — Requires accurate PK definition (
_quanti_id)❌ Cannot remove deleted entities — Orphaned records remain unless manually cleaned
Use cases
Ideal for:
Dimension tables (campaigns, ad groups, products, customers) — reflect updates while preserving referential integrity (avoid historization on large, frequently-changing dimension tables)
Master data and reference tables (account hierarchies, org structures)
Fact tables where updates are expected (order status, conversion tracking) — partitioning mitigates historization impact
Tables requiring compliance/audit trails (user permissions, config changes)
Mixed scenarios (new + updates + unchanged), e.g., product catalogs
Not suitable for:
Pure event streams or logs (events should be INSERT-only)
Tables requiring explicit deletion of entities (use REPLACE mode)
Large dimension tables with high change frequency + historization (consider REPLACE or snapshots)
Historization vs. No Historization
Update behavior
New row inserted
Existing row updated in place
Historical tracking
✅ Full history preserved
❌ Only current state
Table growth
📈 Grows with each change
📊 Stable (one row per entity)
Query complexity
⚠️ Need to filter latest version
✅ Direct access to current state
Storage cost
❌ Higher (multiple versions)
✅ Lower (single version)
Query cost (dimension tables)
❌❌ VERY HIGH (full table scans)
✅ Low
Audit capability
✅ Complete audit trail
❌ No audit trail
Temporal analysis
✅ "State at time X" queries
❌ Not possible
Performance impact
❌❌ Severe on dimensions
✅ Minimal
Use case
Small tables, compliance, rare queries
Most dimension tables
Last updated