Databricks Metric Views - Roman Klimenko

In this article, we will take a quick look at Databricks metric views - why we need them, how they work, and what problems they solve. Let's start from a simple dataset and build up from there.

The full source code for the demo is available in this GitHub Gist.

We have three tables: regions, customers, and orders. Three regions (EMEA, AMER, APAC), eight customers spread across them, and ten orders. Nothing fancy — just enough data to see the problem:

CREATE SCHEMA IF NOT EXISTS demo.metric_views;

USE demo.metric_views;

-- Regions
CREATE OR REPLACE TABLE regions (region_id INT, region_name STRING);

INSERT INTO regions
  VALUES (1, 'EMEA'), (2, 'AMER'), (3, 'APAC');

-- Customers
CREATE OR REPLACE TABLE customers (customer_id INT, customer_name STRING, region_id INT);

INSERT INTO customers
  VALUES
    (101, 'Acme Corp', 1), -- EMEA
    (102, 'Globex Ltd', 1), -- EMEA
    (103, 'Initech Inc', 2), -- AMER
    (104, 'Umbrella Co', 2), -- AMER
    (105, 'Wonka Industries', 2), -- AMER
    (106, 'Stark Enterprises', 3), -- APAC
    (107, 'Wayne Corp', 3), -- APAC
    (108, 'Oscorp', 3);

-- APAC
-- Orders
CREATE OR REPLACE TABLE orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  amount DECIMAL(10, 2)
);

INSERT INTO orders
  VALUES
    (1, 101, '2025-01-15', 1000.00), -- Acme (EMEA)
    (2, 101, '2025-02-20', 1500.00), -- Acme (EMEA) — repeat customer
    (3, 102, '2025-01-10', 800.00), -- Globex (EMEA)
    (4, 103, '2025-03-05', 2000.00), -- Initech (AMER)
    (5, 104, '2025-03-12', 500.00), -- Umbrella (AMER)
    (6, 104, '2025-04-01', 700.00), -- Umbrella (AMER) — repeat customer
    (7, 105, '2025-02-28', 1200.00), -- Wonka (AMER)
    (8, 106, '2025-01-20', 3000.00), -- Stark (APAC)
    (9, 107, '2025-04-15', 900.00), -- Wayne (APAC)
    (10, 108, '2025-03-22', 1100.00); -- Oscorp (APAC)

SELECT
  'regions' AS tbl,
  COUNT(*) AS cnt
FROM
  regions
UNION ALL
SELECT
  'customers' AS tbl,
  COUNT(*) AS cnt
FROM
  customers
UNION ALL
SELECT
  'orders' AS tbl,
  COUNT(*) AS cnt
FROM
  orders;

tbl	cnt
regions	3
customers	8
orders	10

We also create a denormalized view that joins everything together. This will be our source for all the queries that follow:

CREATE OR REPLACE VIEW orders_denorm AS
SELECT
  o.order_id,
  o.customer_id,
  c.customer_name,
  r.region_id,
  r.region_name,
  o.order_date,
  o.amount
FROM
  orders o
    JOIN customers c
      ON o.customer_id = c.customer_id
    JOIN regions r
      ON c.region_id = r.region_id;

SELECT
  *
FROM
  orders_denorm
ORDER BY
  order_id;

order_id	customer_id	customer_name	region_id	region_name	order_date	amount
1	101	Acme Corp	1	EMEA	2025-01-15	1000.00
2	101	Acme Corp	1	EMEA	2025-02-20	1500.00
3	102	Globex Ltd	1	EMEA	2025-01-10	800.00
4	103	Initech Inc	2	AMER	2025-03-05	2000.00
5	104	Umbrella Co	2	AMER	2025-03-12	500.00
6	104	Umbrella Co	2	AMER	2025-04-01	700.00
7	105	Wonka Industries	2	AMER	2025-02-28	1200.00
8	106	Stark Enterprises	3	APAC	2025-01-20	3000.00
9	107	Wayne Corp	3	APAC	2025-04-15	900.00
10	108	Oscorp	3	APAC	2025-03-22	1100.00

Now let's build a simple KPI view — total revenue, distinct customers, and revenue per customer, grouped by region:

CREATE OR REPLACE VIEW region_kpis AS
SELECT
  r.region_name,
  SUM(o.amount) AS total_revenue,
  COUNT(DISTINCT o.customer_id) AS distinct_customers,
  SUM(o.amount) / COUNT(DISTINCT o.customer_id) AS revenue_per_customer
FROM
  orders o
    JOIN customers c
      ON o.customer_id = c.customer_id
    JOIN regions r
      ON c.region_id = r.region_id
GROUP BY
  r.region_name;

-- This view looks great per region:
SELECT
  *
FROM
  region_kpis
ORDER BY
  region_name;

region_name	total_revenue	distinct_customers	revenue_per_customer
AMER	4400.00	3	1466.67
APAC	5000.00	3	1666.67
EMEA	3300.00	2	1650.00

Per region, this looks great. But what happens when someone wraps this view in another query to get a grand total?

-- THE PROBLEM — re-aggregating the view gives WRONG results
-- Someone wraps the view in another query to get a grand total...
SELECT
  SUM(total_revenue) AS grand_total_revenue,
  SUM(distinct_customers) AS grand_total_customers, -- WRONG: 3+3+2=8, happens to be correct here but only by luck
  SUM(revenue_per_customer) AS grand_revenue_per_customer -- WRONG!
FROM
  region_kpis;
-- WHY: Summing per-group ratios ≠ the global ratio.
--       Groups have different denominators (2, 3, 3 customers),
--       so the ratios aren't comparable units you can add up.
--       The ratio must be recomputed from the raw numerator and denominator.
--       (a1 + a2 + a3) / (b1 + b2 + b3) != (a1 / b1 + a2 / b2 + a3 / b3) ¯\_(ツ)_/

grand_total_revenue	grand_total_customers	grand_revenue_per_customer
12700.00	8	4783.33

The revenue per customer number is wrong. We're summing three ratios with different denominators (2, 3, and 3 customers). That's not how math works.

And no, averaging doesn't fix it either:

-- You can't average averages!
SELECT
  AVG(revenue_per_customer) AS avg_revenue_per_customer -- STILL WRONG
FROM
  region_kpis;
-- Result: (1466.67 + 1666.67 + 1650.00) / 3 = 1594.44
-- Correct: 1587.50
-- Close-ish, but wrong. And the error grows with uneven group sizes.
-- You cannot safely re-aggregate ratio measures.
-- The measure must be recomputed from raw data at every level of aggregation.

avg_revenue_per_customer
1594.44

Close, but wrong. And the error grows with uneven group sizes. You can’t average averages. The takeaway here is that you cannot safely re-aggregate ratio measures. The metric must be recomputed from raw data at every level of aggregation.

One traditional approach is GROUP BY CUBE, which computes every possible combination of dimensions:

-- With one dimension it's fine.
SELECT
  COALESCE(region_name, 'All Regions') AS region_name,
  SUM(amount) AS total_revenue,
  COUNT(DISTINCT customer_id) AS distinct_customers,
  SUM(amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer,
  region_name AS _sort
FROM
  orders_denorm
GROUP BY
  CUBE (region_name)
ORDER BY
  _sort NULLS LAST;

region_name	total_revenue	distinct_customers	revenue_per_customer	_sort
AMER	4400.00	3	1466.67	AMER
APAC	5000.00	3	1666.67	APAC
EMEA	3300.00	2	1650.00	EMEA
All Regions	12700.00	8	1587.50	null

With one dimension, this is fine. But with three dimensions, CUBE produces 2^3 = 8 grouping levels. With 10 dimensions, you are already at 1024 grouping levels, and with more dimensions it grows from there. That can quickly turn into millions of rows, most of which nobody ever queries. It does not scale:

-- CUBE with 3 dimensions produces 2^3 = 8 grouping levels:
-- every possible subset of {region, customer, date}.
-- With 10 source rows this is manageable, but in production
-- with 10+ dimensions you'd get 1024+ grouping levels and
-- millions of rows — most of which nobody ever queries.
SELECT
  COALESCE(region_name, 'All Regions') AS region_name,
  COALESCE(customer_name, 'All Customers') AS customer_name,
  COALESCE(CAST(order_date AS STRING), 'All Dates') AS order_date,
  SUM(amount) AS total_revenue,
  COUNT(DISTINCT customer_id) AS distinct_customers,
  SUM(amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM
  orders_denorm
GROUP BY
  CUBE (region_name, customer_name, order_date)
ORDER BY
  region_name,
  customer_name,
  order_date

The query above returns 60 rows.

This is where metric views come into play. You define your dimensions and measures once, and the engine recomputes everything correctly at query time, no matter how you slice:

CREATE OR REPLACE VIEW order_metrics_view WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
    display_name: Region
  - name: customer_name
    expr: customer_name
    display_name: Customer
  - name: order_date
    expr: order_date
    display_name: Order Date
measures:
  - name: total_revenue
    expr: SUM(amount)
    display_name: Total Revenue
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
    display_name: Distinct Customers
  - name: order_count
    expr: COUNT(order_id)
    display_name: Order Count
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
    display_name: Revenue per Customer
$$;

The display_name fields are part of Databricks semantic metadata for metric views.

Now query it by region. Measures must be wrapped in the MEASURE function so the engine can evaluate them at the query's grouping level:

-- Query by region — same results as the regular view
-- Measures MUST be wrapped in MEASURE()
SELECT
  region_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
GROUP BY
  region_name
ORDER BY
  region_name;
-- Expected:
--   AMER:  4400 / 3 = 1466.67
--   APAC:  5000 / 3 = 1666.67
--   EMEA:  3300 / 2 = 1650.0

region_name	total_revenue	distinct_customers	revenue_per_customer
AMER	4400.00	3	1466.67
APAC	5000.00	3	1666.67
EMEA	3300.00	2	1650.00

For the grand total, we just omit the GROUP BY clause:

-- Grand total:
-- No GROUP BY -> the engine computes from ALL raw data
-- -----------------------------------------------------------------------------
SELECT
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer -- The ratio is recomputed, not summed!
FROM
  order_metrics_view;
-- Expected: 12700 / 8 = 1587.50  <- CORRECT

total_revenue	distinct_customers	revenue_per_customer
12700.00	8	1587.50

We can also group by a different dimension without rewriting the view:

-- Group by a different dimension — no view rewrite needed
SELECT
  customer_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(order_count) AS orders,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
GROUP BY
  customer_name
ORDER BY
  total_revenue DESC;
-- Revenue per customer when grouped by customer is just their total revenue / 1,
-- because each group contains exactly one distinct customer

customer_name	total_revenue	orders	revenue_per_customer
Stark Enterprises	3000.00	1	3000.00
Acme Corp	2500.00	2	2500.00
Initech Inc	2000.00	1	2000.00
Wonka Industries	1200.00	1	1200.00
Umbrella Co	1200.00	2	1200.00
Oscorp	1100.00	1	1100.00
Wayne Corp	900.00	1	900.00
Globex Ltd	800.00	1	800.00

And we can filter by specific regions, too:

SELECT
  region_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
WHERE -- Filtered query — also correct automatically
  region_name IN ('EMEA', 'AMER')
GROUP BY
  region_name

region_name	total_revenue	revenue_per_customer
EMEA	3300.00	1650.00
AMER	4400.00	1466.67

And here is the grand total for those two regions:

-- Grand total for just EMEA + AMER:
SELECT
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
WHERE
  region_name IN ('EMEA', 'AMER');
-- Expected: (3300 + 4400) / (2 + 3) = 7700 / 5 = 1540.00
-- MEASURE() tells the engine to recompute the expression at whatever grain the query defines.
-- You define the metric ONCE and slice it freely

total_revenue	distinct_customers	revenue_per_customer
7700.00	5	1540.00

Materialization

Aggregated Materialization

For dashboards that repeatedly hit the same slices, metric view materialization can speed things up. According to the docs, this feature is experimental and requires serverless compute plus Databricks Runtime 17.2 or above. There are two types: aggregated and unaggregated materialization:

-- Aggregated Materialization
-- Pre-compute specific dimension + measure combinations.
-- "these are my hot dashboard queries."
CREATE OR REPLACE VIEW order_metrics_mat_agg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: order_count
    expr: COUNT(order_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_by_region
      type: aggregated
      dimensions:
        - region_name
      measures:
        - total_revenue
        - customer_count
        - revenue_per_customer
$$;

Let's try this query. Once the materialization is available, the optimizer can serve it from the materialized result:

-- this SHOULD hit the materialized view (fast path)
-- REFRESH MATERIALIZED VIEW order_metrics_mat_agg;
SELECT
  region_name,
  MEASURE(total_revenue),
  MEASURE(revenue_per_customer)
FROM
  order_metrics_mat_agg
GROUP BY
  region_name

In the plan, we can see the optimizer using the materialized path:

Now let's try another query. It falls back to the source data because the customer_name dimension is not materialized:

-- this falls back to source data (no materialization covers this combo)
-- REFRESH MATERIALIZED VIEW order_metrics_mat_agg;
SELECT
  customer_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_agg
GROUP BY
  customer_name

This query falls back to the source data. The calculations are still correct; they are just not precomputed:

We can also add a separate materialization for customer_name:

CREATE OR REPLACE VIEW order_metrics_mat_agg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: order_count
    expr: COUNT(order_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_by_region
      type: aggregated
      dimensions:
        - region_name
      measures:
        - total_revenue
        - customer_count
        - revenue_per_customer
    # Separate materialized view at customer grain.
    # Can't just add customer_name to mv_by_region — that changes
    # the grain to (region, customer), and ratio measures like
    # revenue_per_customer can't be safely re-aggregated from
    # a finer grain back to region-only. One grain = one MV.
    - name: mv_by_customer
      type: aggregated
      dimensions:
        - customer_name
      measures:
        - total_revenue
$$;

After the refresh completes, this query can use the new materialized result.

One caveat: right after you change the view, you may still see the old plan because the materialization pipeline runs asynchronously in the background and needs time to finish.

You can also trigger view materialization manually:

REFRESH MATERIALIZED VIEW order_metrics_mat_agg;

The more dimensions you materialize together, the more compute and storage the system needs, so use this feature selectively.

Unaggregated Materialization

Another materialization type is unaggregated. With this approach, we can pre-materialize the joined data model (fact plus dimensions flattened). It is very useful when joins are the expensive part, not the aggregation:

-- Unaggregated Materialization
-- Pre-materialize the joined data model (fact + dimensions flattened).
-- Useful when joins are the expensive part, not the aggregation.
CREATE OR REPLACE VIEW order_metrics_mat_unagg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_unaggregated
      type: unaggregated
$$;

-- Both of these queries benefit from the pre-joined data:
SELECT
  region_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_unagg
GROUP BY
  region_name;

SELECT
  customer_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_unagg
GROUP BY
  customer_name

These are Databricks metric views. The feature set is still evolving, but it is already a strong way to define reusable business metrics in one place.

Power BI support for metric views is currently in Beta and relies on BI compatibility mode, which rewrites standard Power BI aggregations to the underlying metric view definition. The docs also note that metric views with materializations are not supported in that mode.

Metric views are Unity Catalog securable objects, so you can manage permissions there, and Unity Catalog lineage helps you trace how data is produced and consumed.