Medallion Architecture Explained (Without the Jargon)
Bronze, Silver, Gold — what these layers actually mean, why the structure matters, and how to implement it without enterprise-scale complexity.
If you’ve spent any time reading about modern data architecture, you’ve probably come across the term “Medallion architecture.” It shows up in Databricks documentation, data engineering blogs, and conference talks — usually surrounded by enough jargon to make it sound more complicated than it is.
The concept is actually straightforward. Here’s what it means, why it works, and how mid-sized companies can implement it without needing a team of 20 engineers.
The Core Idea
Medallion architecture organizes data into three progressive layers, each more refined than the last:
- Bronze: Raw data, exactly as it came from the source
- Silver: Cleaned, validated, and standardized data
- Gold: Business-ready data optimized for reporting and analysis
The key insight is that each transformation is a separate, auditable step. You never overwrite the original data. If something goes wrong at the Silver layer, you still have Bronze to fall back on. If business logic changes, you reprocess from Silver without touching Bronze.
This sounds obvious in retrospect, but it’s different from how most companies actually manage data — which is usually a tangle of transformations that nobody fully understands, applied directly to operational data.
Bronze: The Raw Archive
The Bronze layer is a faithful copy of every source system, stored in its original structure. The goal here is completeness and fidelity, not cleanliness.
If your ERP exports a CSV with inconsistent date formats, those inconsistencies go into Bronze. If your CRM uses null in one field and an empty string in another, that inconsistency goes into Bronze too.
Why preserve the mess? Because it gives you a complete audit trail. If a number looks wrong six months from now, you can trace it all the way back to the original source record. You can also replay the entire pipeline from scratch if you ever need to.
Practically speaking, Bronze data is typically stored as Parquet files — compressed, columnar, and readable by virtually every data tool. Storage is cheap, and you never delete anything.
Silver: The Source of Truth
The Silver layer is where business logic is applied. This is where you:
- Standardize date formats
- Deduplicate records that appear in multiple systems
- Apply consistent naming conventions (same customer ID across CRM and ERP)
- Fill in missing values according to defined rules
- Flag records that need manual review
The critical thing about Silver is that the transformations are code — not spreadsheet formulas, not manual steps, not someone’s tribal knowledge. They’re version-controlled SQL transformations (dbt is the most common tool for this) that can be reviewed, tested, and rolled back.
This is what makes Silver a genuine “source of truth.” Anyone in the organization can look at a Silver table, trace exactly how it was built, and understand where each value came from.
Gold: Ready to Use
The Gold layer contains the datasets that business users actually interact with — the tables that power dashboards, reports, and analytical queries.
Gold tables are built with the end user in mind. They’re pre-aggregated where appropriate, they use business terminology (not system field names), and they’re structured for the specific questions they’re meant to answer.
A Gold table for financial reporting might include pre-calculated gross margin by product line, with exchange rates already applied and revenue recognition rules already enforced. An analyst can query it with simple SQL and get a correct answer — without needing to know anything about how it was built.
A Practical Stack for Mid-Sized Companies
You don’t need Databricks and a team of senior engineers to implement this. Here’s a stack that works well at the 10GB–500GB scale:
| Layer | Tool |
|---|---|
| Ingestion | Airbyte (open source) or custom scripts |
| Storage | Parquet files on S3 or local storage |
| Transformation | dbt (runs SQL, handles dependencies, generates docs) |
| Orchestration | Dagster (scheduling, monitoring, alerting) |
| Query engine | DuckDB (fast, free, SQL-compatible) |
| BI | Metabase or Superset (both open source) |
Total infrastructure cost for a typical 50-employee company: around $100–200/month in cloud storage and compute. Versus $3,000–10,000/month for a comparable managed solution.
Why It Matters: The Reproducibility Problem
Here’s the practical problem that Medallion architecture solves: inconsistent numbers.
In organizations without a structured approach to data transformation, the same question asked of different systems gives different answers. Sales says revenue was $2.4M. Finance says $2.1M. The CEO wants to know which one is right, and nobody can explain the discrepancy quickly.
With a Medallion architecture, there’s one Silver table for revenue, built from one set of transformations, following one set of business rules. If the number looks wrong, you can trace it. If the rules need to change, you change them in one place and reprocess.
One source of truth. Auditable. Reproducible.
Common Mistakes When Implementing
A few things to avoid:
Skipping Bronze: Some teams jump straight to a “clean” table without preserving the raw data. This seems efficient until you need to debug something or replay a historical period with different business logic.
Embedding business logic in dashboards: BI tools should display data, not transform it. If your revenue calculation lives inside a Tableau calculated field, it’s invisible, unversioned, and impossible to test.
Over-engineering Gold: Gold tables should answer specific, well-defined business questions. Building a single “master” Gold table that tries to answer everything usually produces a bloated, slow, hard-to-maintain monster.
At Sediment Data, we implement Medallion architectures for mid-sized companies using open-source tools — no vendor lock-in, no enterprise pricing. If your team is spending too much time fighting with inconsistent data, let’s talk about what a proper foundation looks like.
¿Tenés este problema en tu empresa?
Agendá una llamada de 30 minutos sin compromiso. Te contamos cómo podemos ayudarte a ordenar tu infraestructura de datos.
Agendá una llamada →