Design a Data Warehouse (Snowflake/BigQuery)
Design a cloud-native data warehouse that can store petabytes of structured data and execute complex analytical queries — joins across billion-row tables, multi-dimensional aggregations, window functions — in seconds to minutes. If you have used Snowflake, BigQuery, or Redshift, you know the user experience we are targeting: an analyst writes SQL, hits run, and gets results across terabytes of data in under a minute.
The fundamental insight that shapes this entire design is how different analytical workloads are from transactional ones. An OLTP database like PostgreSQL is optimized for single-row reads and writes — find one customer, update their balance. A data warehouse does the opposite: it scans millions of rows but typically reads only a handful of columns at a time. That workload shape drives us toward columnar storage, where storing each column contiguously on disk means a query reading 3 of 50 columns only touches 6% of the data.
The revolutionary architectural idea here is compute-storage separation, which Snowflake pioneered. Data lives in cheap, durable cloud object storage (S3/GCS), and computation is provided by elastic virtual warehouse clusters that can be created, resized, and destroyed independently. This separation is what enables you to run 10 concurrent analytical workloads on the same dataset without them stepping on each other, scale compute up for a heavy ad-hoc query and back down to zero when the analyst goes to lunch, and pay only for compute you actually use. You will design a columnar storage layer, an MPP query engine, an ETL/ELT ingestion pipeline, and a multi-tenant resource management system.
Premium Access