Databricks Metric Views

In this article, we will take a quick look at Databricks metric views - why we need them, how they work, and what problems they solve. Let's start from a simple dataset and build up from there.

The full source code for the demo is available in this GitHub Gist.

We have three tables: regions, customers, and orders. Three regions (EMEA, AMER, APAC), eight customers spread across them, and ten orders. Nothing fancy — just enough data to see the problem:

CREATE SCHEMA IF NOT EXISTS demo.metric_views;

USE demo.metric_views;

-- Regions
CREATE OR REPLACE TABLE regions (region_id INT, region_name STRING);

INSERT INTO regions
  VALUES (1, 'EMEA'), (2, 'AMER'), (3, 'APAC');

-- Customers
CREATE OR REPLACE TABLE customers (customer_id INT, customer_name STRING, region_id INT);

INSERT INTO customers
  VALUES
    (101, 'Acme Corp', 1), -- EMEA
    (102, 'Globex Ltd', 1), -- EMEA
    (103, 'Initech Inc', 2), -- AMER
    (104, 'Umbrella Co', 2), -- AMER
    (105, 'Wonka Industries', 2), -- AMER
    (106, 'Stark Enterprises', 3), -- APAC
    (107, 'Wayne Corp', 3), -- APAC
    (108, 'Oscorp', 3);

-- APAC
-- Orders
CREATE OR REPLACE TABLE orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  amount DECIMAL(10, 2)
);

INSERT INTO orders
  VALUES
    (1, 101, '2025-01-15', 1000.00), -- Acme (EMEA)
    (2, 101, '2025-02-20', 1500.00), -- Acme (EMEA) — repeat customer
    (3, 102, '2025-01-10', 800.00), -- Globex (EMEA)
    (4, 103, '2025-03-05', 2000.00), -- Initech (AMER)
    (5, 104, '2025-03-12', 500.00), -- Umbrella (AMER)
    (6, 104, '2025-04-01', 700.00), -- Umbrella (AMER) — repeat customer
    (7, 105, '2025-02-28', 1200.00), -- Wonka (AMER)
    (8, 106, '2025-01-20', 3000.00), -- Stark (APAC)
    (9, 107, '2025-04-15', 900.00), -- Wayne (APAC)
    (10, 108, '2025-03-22', 1100.00); -- Oscorp (APAC)
SELECT
  'regions' AS tbl,
  COUNT(*) AS cnt
FROM
  regions
UNION ALL
SELECT
  'customers' AS tbl,
  COUNT(*) AS cnt
FROM
  customers
UNION ALL
SELECT
  'orders' AS tbl,
  COUNT(*) AS cnt
FROM
  orders;
tblcnt
regions3
customers8
orders10

We also create a denormalized view that joins everything together. This will be our source for all the queries that follow:

CREATE OR REPLACE VIEW orders_denorm AS
SELECT
  o.order_id,
  o.customer_id,
  c.customer_name,
  r.region_id,
  r.region_name,
  o.order_date,
  o.amount
FROM
  orders o
    JOIN customers c
      ON o.customer_id = c.customer_id
    JOIN regions r
      ON c.region_id = r.region_id;
SELECT
  *
FROM
  orders_denorm
ORDER BY
  order_id;
order_idcustomer_idcustomer_nameregion_idregion_nameorder_dateamount
1101Acme Corp1EMEA2025-01-151000.00
2101Acme Corp1EMEA2025-02-201500.00
3102Globex Ltd1EMEA2025-01-10800.00
4103Initech Inc2AMER2025-03-052000.00
5104Umbrella Co2AMER2025-03-12500.00
6104Umbrella Co2AMER2025-04-01700.00
7105Wonka Industries2AMER2025-02-281200.00
8106Stark Enterprises3APAC2025-01-203000.00
9107Wayne Corp3APAC2025-04-15900.00
10108Oscorp3APAC2025-03-221100.00

Now let's build a simple KPI view — total revenue, distinct customers, and revenue per customer, grouped by region:

CREATE OR REPLACE VIEW region_kpis AS
SELECT
  r.region_name,
  SUM(o.amount) AS total_revenue,
  COUNT(DISTINCT o.customer_id) AS distinct_customers,
  SUM(o.amount) / COUNT(DISTINCT o.customer_id) AS revenue_per_customer
FROM
  orders o
    JOIN customers c
      ON o.customer_id = c.customer_id
    JOIN regions r
      ON c.region_id = r.region_id
GROUP BY
  r.region_name;
-- This view looks great per region:
SELECT
  *
FROM
  region_kpis
ORDER BY
  region_name;
region_nametotal_revenuedistinct_customersrevenue_per_customer
AMER4400.0031466.67
APAC5000.0031666.67
EMEA3300.0021650.00

Per region, this looks great. But what happens when someone wraps this view in another query to get a grand total?

-- THE PROBLEM — re-aggregating the view gives WRONG results
-- Someone wraps the view in another query to get a grand total...
SELECT
  SUM(total_revenue) AS grand_total_revenue,
  SUM(distinct_customers) AS grand_total_customers, -- WRONG: 3+3+2=8, happens to be correct here but only by luck
  SUM(revenue_per_customer) AS grand_revenue_per_customer -- WRONG!
FROM
  region_kpis;
-- WHY: Summing per-group ratios ≠ the global ratio.
--       Groups have different denominators (2, 3, 3 customers),
--       so the ratios aren't comparable units you can add up.
--       The ratio must be recomputed from the raw numerator and denominator.
--       (a1 + a2 + a3) / (b1 + b2 + b3) != (a1 / b1 + a2 / b2 + a3 / b3) ¯\_(ツ)_/
grand_total_revenuegrand_total_customersgrand_revenue_per_customer
12700.0084783.33

The revenue per customer number is wrong. We're summing three ratios with different denominators (2, 3, and 3 customers). That's not how math works.

And no, averaging doesn't fix it either:

-- You can't average averages!
SELECT
  AVG(revenue_per_customer) AS avg_revenue_per_customer -- STILL WRONG
FROM
  region_kpis;
-- Result: (1466.67 + 1666.67 + 1650.00) / 3 = 1594.44
-- Correct: 1587.50
-- Close-ish, but wrong. And the error grows with uneven group sizes.
-- You cannot safely re-aggregate ratio measures.
-- The measure must be recomputed from raw data at every level of aggregation.
avg_revenue_per_customer
1594.44

Close, but wrong. And the error grows with uneven group sizes. You can’t average averages. The takeaway here is that you cannot safely re-aggregate ratio measures. The metric must be recomputed from raw data at every level of aggregation.


One traditional approach is GROUP BY CUBE, which computes every possible combination of dimensions:

-- With one dimension it's fine.
SELECT
  COALESCE(region_name, 'All Regions') AS region_name,
  SUM(amount) AS total_revenue,
  COUNT(DISTINCT customer_id) AS distinct_customers,
  SUM(amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer,
  region_name AS _sort
FROM
  orders_denorm
GROUP BY
  CUBE (region_name)
ORDER BY
  _sort NULLS LAST;
region_nametotal_revenuedistinct_customersrevenue_per_customer_sort
AMER4400.0031466.67AMER
APAC5000.0031666.67APAC
EMEA3300.0021650.00EMEA
All Regions12700.0081587.50null

With one dimension, this is fine. But with three dimensions, CUBE produces 2^3 = 8 grouping levels. With 10 dimensions, you are already at 1024 grouping levels, and with more dimensions it grows from there. That can quickly turn into millions of rows, most of which nobody ever queries. It does not scale:

-- CUBE with 3 dimensions produces 2^3 = 8 grouping levels:
-- every possible subset of {region, customer, date}.
-- With 10 source rows this is manageable, but in production
-- with 10+ dimensions you'd get 1024+ grouping levels and
-- millions of rows — most of which nobody ever queries.
SELECT
  COALESCE(region_name, 'All Regions') AS region_name,
  COALESCE(customer_name, 'All Customers') AS customer_name,
  COALESCE(CAST(order_date AS STRING), 'All Dates') AS order_date,
  SUM(amount) AS total_revenue,
  COUNT(DISTINCT customer_id) AS distinct_customers,
  SUM(amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM
  orders_denorm
GROUP BY
  CUBE (region_name, customer_name, order_date)
ORDER BY
  region_name,
  customer_name,
  order_date

The query above returns 60 rows.


This is where metric views come into play. You define your dimensions and measures once, and the engine recomputes everything correctly at query time, no matter how you slice:

CREATE OR REPLACE VIEW order_metrics_view WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
    display_name: Region
  - name: customer_name
    expr: customer_name
    display_name: Customer
  - name: order_date
    expr: order_date
    display_name: Order Date
measures:
  - name: total_revenue
    expr: SUM(amount)
    display_name: Total Revenue
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
    display_name: Distinct Customers
  - name: order_count
    expr: COUNT(order_id)
    display_name: Order Count
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
    display_name: Revenue per Customer
$$;

The display_name fields are part of Databricks semantic metadata for metric views.

Now query it by region. Measures must be wrapped in the MEASURE function so the engine can evaluate them at the query's grouping level:

-- Query by region — same results as the regular view
-- Measures MUST be wrapped in MEASURE()
SELECT
  region_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
GROUP BY
  region_name
ORDER BY
  region_name;
-- Expected:
--   AMER:  4400 / 3 = 1466.67
--   APAC:  5000 / 3 = 1666.67
--   EMEA:  3300 / 2 = 1650.0
region_nametotal_revenuedistinct_customersrevenue_per_customer
AMER4400.0031466.67
APAC5000.0031666.67
EMEA3300.0021650.00

For the grand total, we just omit the GROUP BY clause:

-- Grand total:
-- No GROUP BY -> the engine computes from ALL raw data
-- -----------------------------------------------------------------------------
SELECT
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer -- The ratio is recomputed, not summed!
FROM
  order_metrics_view;
-- Expected: 12700 / 8 = 1587.50  <- CORRECT
total_revenuedistinct_customersrevenue_per_customer
12700.0081587.50

We can also group by a different dimension without rewriting the view:

-- Group by a different dimension — no view rewrite needed
SELECT
  customer_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(order_count) AS orders,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
GROUP BY
  customer_name
ORDER BY
  total_revenue DESC;
-- Revenue per customer when grouped by customer is just their total revenue / 1,
-- because each group contains exactly one distinct customer
customer_nametotal_revenueordersrevenue_per_customer
Stark Enterprises3000.0013000.00
Acme Corp2500.0022500.00
Initech Inc2000.0012000.00
Wonka Industries1200.0011200.00
Umbrella Co1200.0021200.00
Oscorp1100.0011100.00
Wayne Corp900.001900.00
Globex Ltd800.001800.00

And we can filter by specific regions, too:

SELECT
  region_name,
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
WHERE -- Filtered query — also correct automatically
  region_name IN ('EMEA', 'AMER')
GROUP BY
  region_name
region_nametotal_revenuerevenue_per_customer
EMEA3300.001650.00
AMER4400.001466.67

And here is the grand total for those two regions:

-- Grand total for just EMEA + AMER:
SELECT
  MEASURE(total_revenue) AS total_revenue,
  MEASURE(customer_count) AS distinct_customers,
  MEASURE(revenue_per_customer) AS revenue_per_customer
FROM
  order_metrics_view
WHERE
  region_name IN ('EMEA', 'AMER');
-- Expected: (3300 + 4400) / (2 + 3) = 7700 / 5 = 1540.00
-- MEASURE() tells the engine to recompute the expression at whatever grain the query defines.
-- You define the metric ONCE and slice it freely
total_revenuedistinct_customersrevenue_per_customer
7700.0051540.00

Materialization

Aggregated Materialization

For dashboards that repeatedly hit the same slices, metric view materialization can speed things up. According to the docs, this feature is experimental and requires serverless compute plus Databricks Runtime 17.2 or above. There are two types: aggregated and unaggregated materialization:

-- Aggregated Materialization
-- Pre-compute specific dimension + measure combinations.
-- "these are my hot dashboard queries."
CREATE OR REPLACE VIEW order_metrics_mat_agg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: order_count
    expr: COUNT(order_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_by_region
      type: aggregated
      dimensions:
        - region_name
      measures:
        - total_revenue
        - customer_count
        - revenue_per_customer
$$;

Let's try this query. Once the materialization is available, the optimizer can serve it from the materialized result:

-- this SHOULD hit the materialized view (fast path)
-- REFRESH MATERIALIZED VIEW order_metrics_mat_agg;
SELECT
  region_name,
  MEASURE(total_revenue),
  MEASURE(revenue_per_customer)
FROM
  order_metrics_mat_agg
GROUP BY
  region_name

In the plan, we can see the optimizer using the materialized path:

Query plan

Now let's try another query. It falls back to the source data because the customer_name dimension is not materialized:

-- this falls back to source data (no materialization covers this combo)
-- REFRESH MATERIALIZED VIEW order_metrics_mat_agg;
SELECT
  customer_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_agg
GROUP BY
  customer_name

This query falls back to the source data. The calculations are still correct; they are just not precomputed:

Query plan

We can also add a separate materialization for customer_name:

CREATE OR REPLACE VIEW order_metrics_mat_agg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: order_count
    expr: COUNT(order_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_by_region
      type: aggregated
      dimensions:
        - region_name
      measures:
        - total_revenue
        - customer_count
        - revenue_per_customer
    # Separate materialized view at customer grain.
    # Can't just add customer_name to mv_by_region — that changes
    # the grain to (region, customer), and ratio measures like
    # revenue_per_customer can't be safely re-aggregated from
    # a finer grain back to region-only. One grain = one MV.
    - name: mv_by_customer
      type: aggregated
      dimensions:
        - customer_name
      measures:
        - total_revenue
$$;

After the refresh completes, this query can use the new materialized result.

One caveat: right after you change the view, you may still see the old plan because the materialization pipeline runs asynchronously in the background and needs time to finish.

You can also trigger view materialization manually:

REFRESH MATERIALIZED VIEW order_metrics_mat_agg;

The more dimensions you materialize together, the more compute and storage the system needs, so use this feature selectively.

Unaggregated Materialization

Another materialization type is unaggregated. With this approach, we can pre-materialize the joined data model (fact plus dimensions flattened). It is very useful when joins are the expensive part, not the aggregation:

-- Unaggregated Materialization
-- Pre-materialize the joined data model (fact + dimensions flattened).
-- Useful when joins are the expensive part, not the aggregation.
CREATE OR REPLACE VIEW order_metrics_mat_unagg WITH METRICS LANGUAGE YAML AS
$$
version: "1.1"
source: demo.metric_views.orders_denorm
dimensions:
  - name: region_name
    expr: region_name
  - name: customer_name
    expr: customer_name
  - name: order_date
    expr: order_date
measures:
  - name: total_revenue
    expr: SUM(amount)
  - name: customer_count
    expr: COUNT(DISTINCT customer_id)
  - name: revenue_per_customer
    expr: SUM(amount) / COUNT(DISTINCT customer_id)
materialization:
  mode: "relaxed"
  schedule: "EVERY 1 HOUR"
  materialized_views:
    - name: mv_unaggregated
      type: unaggregated
$$;
-- Both of these queries benefit from the pre-joined data:
SELECT
  region_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_unagg
GROUP BY
  region_name;

SELECT
  customer_name,
  MEASURE(total_revenue)
FROM
  order_metrics_mat_unagg
GROUP BY
  customer_name

These are Databricks metric views. The feature set is still evolving, but it is already a strong way to define reusable business metrics in one place.

Power BI support for metric views is currently in Beta and relies on BI compatibility mode, which rewrites standard Power BI aggregations to the underlying metric view definition. The docs also note that metric views with materializations are not supported in that mode.

Metric views are Unity Catalog securable objects, so you can manage permissions there, and Unity Catalog lineage helps you trace how data is produced and consumed.