Open Source · BigQuery · Datastream

The SQL that actually
works on real data.

Battle-tested BigQuery queries for Datastream-replicated tables. Every query here has been run, benchmarked, and validated. The key insight: deduplication is everything — get it wrong and your numbers are fiction.

— Queries

24 Field KPIs

93% Cost Reduction

prod Data Source

Deduplication is the foundation.

Datastream CDC replication creates duplicate rows. Every query in this library uses QUALIFY to resolve to the single latest record before any KPI logic runs. Skip this and your counts are wrong — always.

-- Standard Datastream dedup — use in every CTE QUALIFY ROW_NUMBER() OVER ( PARTITION BY code ORDER BY COALESCE(updated_at, created_at) DESC, datastream_metadata.source_timestamp DESC ) = 1

All

Deduplication

Agent Scoping

Dashboard

Attendance

Visits

Collections

MTD

Lender

For AI Agents

How to use this library

This library is structured for both humans and AI agents. Each query includes metadata comments (@id, @source, @scanned_gb, @cost_usd) and follows strict patterns that can be composed.

Rule 01

Always dedup first. Wrap every base table in a CTE with QUALIFY ROW_NUMBER() OVER (PARTITION BY id/code …) = 1 before joining or aggregating.

Rule 02

Two lead scopes. Use all_leads (any lead, not deleted) for headcount. Use scoped_leads (active=TRUE only) for visit/payment KPIs.

Rule 03

Never COALESCE is_active. Write is_active = TRUE not COALESCE(is_active, TRUE) = TRUE — the latter treats NULL as active and inflates counts.

Rule 04

Timezone always Asia/Kolkata. All date functions: DATE(ts, 'Asia/Kolkata') and CURRENT_DATE('Asia/Kolkata').

Rule 05

workspace_code scoping. Always filter WHERE workspace_code = 'crdhs' on all prod tables — they contain data for all tenants.

Rule 06

Combined queries save 93%. Run all 24 KPIs in one BigQuery job by stacking scalar subqueries in a single SELECT. One scan, one bill.

The SQL that actuallyworks on real data.

Deduplication is the foundation.

How to use this library

The SQL that actually
works on real data.