Query Library
Last run: 2026-03-11 prod · live For AI Agents GitHub
Open Source · BigQuery · Datastream

The SQL that actually
works on real data.

Battle-tested BigQuery queries for Datastream-replicated tables. Every query here has been run, benchmarked, and validated. The key insight: deduplication is everything — get it wrong and your numbers are fiction.

Queries
24 Field KPIs
93% Cost Reduction
prod Data Source
★ Signature Pattern

Deduplication is the foundation.

Datastream CDC replication creates duplicate rows. Every query in this library uses QUALIFY to resolve to the single latest record before any KPI logic runs. Skip this and your counts are wrong — always.

-- Standard Datastream dedup — use in every CTE QUALIFY ROW_NUMBER() OVER ( PARTITION BY code ORDER BY COALESCE(updated_at, created_at) DESC, datastream_metadata.source_timestamp DESC ) = 1
All
Deduplication
Agent Scoping
Dashboard
Attendance
Visits
Collections
MTD
Lender

How to use this library

This library is structured for both humans and AI agents. Each query includes metadata comments (@id, @source, @scanned_gb, @cost_usd) and follows strict patterns that can be composed.

Rule 01

Always dedup first. Wrap every base table in a CTE with QUALIFY ROW_NUMBER() OVER (PARTITION BY id/code …) = 1 before joining or aggregating.

Rule 02

Two lead scopes. Use all_leads (any lead, not deleted) for headcount. Use scoped_leads (active=TRUE only) for visit/payment KPIs.

Rule 03

Never COALESCE is_active. Write is_active = TRUE not COALESCE(is_active, TRUE) = TRUE — the latter treats NULL as active and inflates counts.

Rule 04

Timezone always Asia/Kolkata. All date functions: DATE(ts, 'Asia/Kolkata') and CURRENT_DATE('Asia/Kolkata').

Rule 05

workspace_code scoping. Always filter WHERE workspace_code = 'crdhs' on all prod tables — they contain data for all tenants.

Rule 06

Combined queries save 93%. Run all 24 KPIs in one BigQuery job by stacking scalar subqueries in a single SELECT. One scan, one bill.