The SQL that actually
works on real data.
Battle-tested BigQuery queries for Datastream-replicated tables. Every query here has been run, benchmarked, and validated. The key insight: deduplication is everything — get it wrong and your numbers are fiction.
How to use this library
This library is structured for both humans and AI agents. Each query includes metadata comments (@id, @source, @scanned_gb, @cost_usd) and follows strict patterns that can be composed.
Always dedup first. Wrap every base table in a CTE with QUALIFY ROW_NUMBER() OVER (PARTITION BY id/code …) = 1 before joining or aggregating.
Two lead scopes. Use all_leads (any lead, not deleted) for headcount. Use scoped_leads (active=TRUE only) for visit/payment KPIs.
Never COALESCE is_active. Write is_active = TRUE not COALESCE(is_active, TRUE) = TRUE — the latter treats NULL as active and inflates counts.
Timezone always Asia/Kolkata. All date functions: DATE(ts, 'Asia/Kolkata') and CURRENT_DATE('Asia/Kolkata').
workspace_code scoping. Always filter WHERE workspace_code = 'crdhs' on all prod tables — they contain data for all tenants.
Combined queries save 93%. Run all 24 KPIs in one BigQuery job by stacking scalar subqueries in a single SELECT. One scan, one bill.