← Back to Blog

Building a Production-Grade Medallion Architecture on Microsoft Fabric

An end-to-end Bronze → Silver → Gold build on Microsoft Fabric using AdventureWorks as an on-premises source — covering RLS/OLS enforcement, FinOps (61.5% compute reduction), fan-out orchestration, KQL monitoring, and diagnosing a real platform bug through controlled isolation testing.

Building a Production-Grade Medallion Architecture on Microsoft Fabric

An end-to-end Bronze → Silver → Gold build on Microsoft Fabric, orchestrated, monitored, and cost-optimized — using AdventureWorks as an on-premises source system, with everything that went wrong, got fixed, and got measured along the way.


Why This Project

Most "I built a medallion architecture" write-ups stop at the happy path: ingest, transform, model, ship a report. This one doesn't, on purpose. The goal wasn't to prove Fabric can ingest a SQL Server database into a Lakehouse — that part is well-documented and, frankly, easy. The goal was to push into the parts that don't appear in quickstart docs: row- and object-level security that actually enforces instead of just looking like it does, a certification workflow that functions as a real governance process rather than a checkbox, a FinOps pass that turns "this notebook feels slow" into a measured and attributable number, scheduled orchestration that is accountable for its own cost, live alerting tested against real triggered activity — and, because it happened, a genuine platform-level bug diagnosed through controlled isolation testing rather than guessing.

The full stack: on-premises SQL Server (AdventureWorks) → on-premises data gateway → Fabric Data Factory pipeline (weekly schedule, fan-out / fan-in orchestration) → Bronze Lakehouse → PySpark Silver and Gold notebooks → Direct Lake semantic model → certified, governed executive report → Workspace Monitoring + Activator alerting.

End-to-end medallion pipeline architecture on Microsoft Fabric

End-to-end medallion pipeline — SQL Server to Direct Lake semantic model with Workspace Monitoring

The three Lakehouses that form the physical medallion layers in the WS_LC-Dev workspace — each automatically provisioned with its own SQL analytics endpoint:

Fabric workspace showing lh_adventureworks_bronze, lh_adventureworks_gold, and lh_adventureworks_silver Lakehouses with their SQL analytics endpoints

The three-layer Lakehouse structure in WS_LC-Dev — Bronze, Silver, and Gold each have a Lakehouse item and an auto-provisioned SQL analytics endpoint. No endorsement or sensitivity label yet — those come after the certification workflow in Part 4.


How to Use This Guide

This post is written as a reproducible build guide — not a summary of what was built, but a step-by-step walkthrough you can follow to build it yourself. Each Part opens with what needs to be in place before you start, walks through the configuration and the decisions, and flags the exact failure modes you are likely to hit.

Prerequisites

Requirement Notes
Microsoft Fabric capacity F2 minimum; F8 recommended based on measured workload — see Part 6
AdventureWorks 2025 SQL Server database on a Windows machine you control
Microsoft 365 / Entra admin access To create security groups and assign Fabric licences
Fabric tenant admin access To enable gateway, Git, and certification settings in the Admin portal
GitHub account For Git integration in Part 5 (free account is sufficient)
Tabular Editor 2 (free) For Object-Level Security — tabulareditor.com
DAX Studio (free) For RLS identity testing — daxstudio.org

Build sequence

Each Part builds on the previous one. This table shows the order, what you produce, and what must already be in place:

Step Covers What you build Hard dependency
0 Gateway setup Secure channel from on-premises SQL Server to Fabric SQL Server running and reachable
1 Part 1 Bronze ingestion pipeline + Silver and Gold PySpark notebooks Gateway + connection working
2 Part 2 Security groups, workspace roles, RLS, OLS, OneLake roles Gold Lakehouse populated
3 Part 3 Direct Lake semantic model + three-page executive report Part 2 security layer in place
4 Part 4 Certification, lineage view, Purview audit logging Report published in Fabric
5 Part 5 Git source control via GitHub GitHub PAT (classic) and empty repo ready
6 Part 6 FinOps measurement + capacity right-sizing At least one full pipeline run complete
7 Part 7 Fan-out / fan-in orchestration pipeline with weekly schedule All notebooks tested individually
8 Part 8 Workspace Monitoring KQL + Activator email alerting Orchestration pipeline scheduled and running

Joining at a specific part? The gateway setup and Part 1 are the hard dependencies — everything else builds on a populated Gold Lakehouse. Each subsequent Part lists exactly what needs to be in place at the top of the section.


Before You Start — Bridging On-Premises SQL Server to Microsoft Fabric

Before a single pipeline runs, Fabric needs a secure, authenticated channel to reach a database that sits behind a corporate firewall or on a local machine. That channel is the on-premises data gateway — a lightweight Windows service you install on a machine that can reach the SQL Server, which then relays encrypted traffic between the Fabric service and the on-premises source. Nothing in the Fabric cloud ever touches your database directly; the gateway is the only point of contact.

This step is often glossed over in tutorials because it is infrastructure, not data engineering. In practice it is the step that blocks the most beginners, because the gateway, the connection object, and the pipeline all have to be configured consistently before anything else works.

Step 1 — Install and register the gateway

Download the on-premises data gateway installer from aka.ms/opdg and run it on the Windows machine that hosts — or can network-reach — your SQL Server instance. Standard install mode (not personal mode) is required for Fabric pipelines.

On-premises data gateway installer running

The installer runs silently after accepting the terms — no configuration choices at this stage.

Once installation completes, the gateway registration screen opens. Three fields matter here:

Gateway registration form — gateway name, recovery key, region, and signed-in account

Registration form — name the gateway, set the recovery key, and confirm the region before clicking Configure.

Field Value used Why it matters
Gateway name lc_on_prem_gateway This name appears in Fabric's connection UI — make it descriptive and environment-prefixed
Recovery key (stored in secrets vault) Required to restore the gateway if the host machine changes or is rebuilt — store it before clicking Configure
Region Canada Central Must match your Fabric capacity region. A mismatch forces all traffic to cross regions, adding latency and potentially violating data residency requirements
Sign-in account guyfomen@lanacloud.com The Entra account that "owns" the gateway — whoever manages it later needs access through this account

After clicking Configure, wait for the status screen:

Gateway status screen showing lc_on_prem_gateway online and ready, with Microsoft Fabric listed as a connected service

"The gateway lc_on_prem_gateway is online and ready to be used." — Microsoft Fabric (Default environment) shows Ready in the connected services list.

The connected services panel is the most important thing to verify on this screen. Microsoft Fabric must appear with "Ready" status. If it is missing or shows an error, the pipeline will fail to authenticate before it even reaches the SQL Server. Gateway version 3000.322.3 (June 2026 release) was used here.

Step 2 — Create the connection object in Fabric

The gateway makes the channel available; a connection object tells Fabric how to use it for a specific data source. In Fabric, go to Settings → Manage connections and gateways → New connection.

New connection panel — On-premises type, lc_on_prem_gateway, SQL Server, localhost, AdventureWorks2025

Connection form — connection type: On-premises. Gateway: lc_on_prem_gateway. Connector: SQL Server. Server: localhost. Database: AdventureWorks2025.

The connection configuration used for this project:

Setting Value
Connection type On-premises
Gateway cluster lc_on_prem_gateway
Connection name conn_adventureworks_onprem
Data source type SQL Server
Server localhost
Database AdventureWorks2025
Authentication method Basic
Username aw_fabric_reader

Two points worth emphasising about the credential and privacy settings at the bottom of the same form:

Connection form lower section — credentials encrypted on-premises, Privacy level: Organizational, SSO Kerberos unchecked

Credential note: "encrypted using the key stored on-premises on the gateway server" — the password never leaves the on-premises machine in plaintext. Privacy level: Organizational.

Dedicated service account, not a personal login. The username aw_fabric_reader is a SQL Server login created specifically for this pipeline with read-only access to AdventureWorks2025. Using a personal account means the pipeline breaks the moment that person changes their password or leaves the organisation. Using sa or a DBO-level account violates the principle of least privilege. A named, read-only service account is the right pattern.

Privacy level: Organizational. This setting controls how Power Query isolates this data source when combining it with others. "Private" would prevent the gateway from passing query results to any other source — which silently breaks pipelines that join data across connections. "None" removes isolation entirely. "Organizational" is the correct default for internal enterprise data: it allows data sharing within the Fabric tenant while keeping it isolated from external public sources.

SSO via Kerberos is available but was not enabled here. Kerberos SSO passes the end-user's Entra identity through to the SQL Server — useful when RLS is enforced at the database level, not needed when the service account already carries the correct permissions.

Step 3 — Verify the connection is registered

After saving, navigate to Manage connections and gateways to confirm the connection appears and shows a healthy status:

Manage connections and gateways — conn_adventureworks_onprem showing SQL Server type, green status, lc_on_prem_gateway

conn_adventureworks_onprem — SQL Server type, Gateway: lc_on_prem_gateway, Status: connected (green indicator).

conn_adventureworks_onprem appears with a green status indicator. This connection object is now available to select in any Fabric Data Factory pipeline activity that reads from AdventureWorks2025 — no re-entering credentials each time, no hardcoded passwords in pipeline parameters.

Learner note: If the status shows an error at this stage, check in this order: (1) Is the gateway service running on the host machine? (2) Does the machine have network access to the SQL Server on port 1433? (3) Is the aw_fabric_reader login enabled in SQL Server with the correct database permissions? These three account for the majority of connection failures.


Part 1 — Modeling Decisions That Don't Show Up in Tutorials

Before starting this part: Bronze, Silver, and Gold Lakehouses must exist in the WS_LC-Dev workspace (they can be empty). The on-premises gateway and conn_adventureworks_onprem connection from the gateway setup must be working. Run the Bronze ingestion pipeline once to confirm data lands in lh_adventureworks_bronze before building the Silver and Gold notebooks.

A calendar dimension that tells the truth

Every Gold dimension in this project traces back to a real Bronze source table. dim_date doesn't — there is no Date table in AdventureWorks. It is generated directly in the Gold notebook using Spark's sequence() function, producing one row per calendar day across the full span of order activity.

The deliberate detail is continuity. Pulling distinct dates from fact_sales would produce rows only for days that had a sale, silently hiding every zero-activity day. A generated, continuous calendar means a "sales by day" trend correctly shows a gap as a gap, rather than that day simply not existing in the dataset. Small decision, but it is the difference between a date dimension that supports honest trend analysis and one that quietly lies by omission.

Resolving a many-to-many relationship without breaking grain

Building dim_customer surfaced a real modeling problem: BusinessEntityAddress is a genuine many-to-many bridge — a customer can have several address types on file (Billing, Shipping, Home, Main Office) simultaneously. A naïve join against that bridge causes fan-out: any customer with two address types ends up duplicated in dim_customer.

The fix was a Spark window function — partition by BusinessEntityID, order each partition by a business-defined priority (Main Office → Billing → Home → anything else), rank with row_number(), and keep only rank 1. That deterministically collapses the relationship into exactly one row per customer, based on a rule about which address matters most — not "whichever row Spark happened to read first."

from pyspark.sql import Window
from pyspark.sql.functions import row_number, when, col

# Priority: Main Office=1, Billing=2, Home=3, others=4
addr_priority = (
    when(col("AddressType") == "Main Office", 1)
    .when(col("AddressType") == "Billing", 2)
    .when(col("AddressType") == "Home", 3)
    .otherwise(4)
)

window_spec = Window.partitionBy("BusinessEntityID").orderBy(addr_priority)

dim_customer = (
    df_address_bridge
    .withColumn("AddrRank", row_number().over(window_spec))
    .filter(col("AddrRank") == 1)
    .drop("AddrRank")
)

It also surfaced something worth checking explicitly: 19,579 address records resolved against 19,119 distinct customers, meaning roughly 3.3% of customers had no address on file. That is a LEFT JOIN, not an INNER JOIN — those rows survive as nulls instead of silently vanishing — and it became its own data-quality check rather than a surprise discovered later in a report.

Treating a five-row table with the same rigor as a 121K-row fact

The dynamic RLS mapping table — linking a user's identity to their sales territory — could have been typed directly into the semantic model in Tabular Editor in under a minute. It went through the full Gold pipeline anyway, deliberately. Every other table has lineage back to a source, passes through the shared DQ framework, and is visible in OneLake like any other governed asset. Making one small lookup table the exception would mean anyone reviewing the architecture later has to remember "that one table is different." Consistency is itself a form of documentation.

Building the shared data quality framework — nb_dq_utils

Every Silver notebook in this project runs the same data quality checks, but none of them define DQ logic directly. Instead, each Silver notebook calls a single shared utility notebook — nb_dq_utils — that owns all the DQ infrastructure: structured logging and a check runner that halts the pipeline on failure.

nb_dq_utils notebook in Microsoft Fabric — get_logger function and run_dq_checks function visible. PySpark (Python), Workspace default environment. Session ready in 11 sec.

nb_dq_utils — the shared DQ utility notebook. Two functions are defined here and imported by every Silver notebook: get_logger for structured logging and run_dq_checks for executing and enforcing quality checks.

The notebook defines two functions:

get_logger(name) — creates a Python logging logger scoped to a specific layer and table name. Every log line is prefixed consistently, which matters in Part 8 when these messages flow into Workspace Monitoring's ItemJobEventLogs table and need to be distinguishable by notebook name.

run_dq_checks(checks, logger, table_name) — takes a list of check dictionaries, each with a name, a passed boolean, and a message string. It iterates the list, logs [DQ] [PASS] or [DQ] [FAIL] for each check, and — if any check fails — raises an Exception that immediately halts the notebook:

raise Exception(
    f"{table_name} DQ failed — {len(failed)} check(s) failed: "
    f"{failed_names}. Pipeline halted."
)

The Pipeline halted. suffix is not decorative. When the orchestration pipeline in Part 7 runs this Silver notebook as an activity and the notebook raises an exception, the activity status flips to Failed — and the Gold notebook downstream of it is never triggered. Bad data cannot reach the Gold layer silently; it stops the run and leaves a named, logged explanation.

Each Silver notebook calls nb_dq_utils via Fabric's %run magic and passes its own check list. A new check is a new dict entry — no changes to nb_dq_utils, no copy-pasted logic:

%run nb_dq_utils

logger = get_logger("silver.sales_order")

run_dq_checks([
    {
        "name": "no_null_sales_order_id",
        "passed": df_sales_order.filter(col("SalesOrderID").isNull()).count() == 0,
        "message": "SalesOrderID contains nulls"
    },
    {
        "name": "referential_integrity_product",
        "passed": df_sales_order.join(df_product, "ProductID", "left_anti").count() == 0,
        "message": "SalesOrderDetail rows reference ProductIDs not in dim_product"
    },
    {
        "name": "no_negative_line_total",
        "passed": df_sales_order.filter(col("LineTotal") < 0).count() == 0,
        "message": "Negative LineTotal values found"
    }
], logger, "sales_order")

The architectural decision behind this pattern — one shared utility, not five copies — is the same logic as the five-row RLS table going through the full pipeline: consistency means there is exactly one place to update when the DQ framework needs to change, one place to look when a check fails, and one set of log formats across every notebook.

Learner note — FinOps consequence: The shared nb_dq_utils notebook appears in the Capacity Metrics app as its own line item (4.2% of workspace CU in Part 6). Even a utility notebook that does no Spark transformations has a session startup cost each time it is invoked. In high-frequency pipelines, consider whether DQ checks can be consolidated into the transformation notebook rather than a separate session. For this project's weekly schedule, the cost was acceptable.


Part 2 — Governance That Actually Enforces, Not Just Looks Like It

Before starting this part: Gold Lakehouse populated from at least one successful pipeline run (the fact_sales and dimension tables must exist). You need Entra admin access to create security groups and assign members, and both test user accounts must have Microsoft Fabric licences assigned in the Microsoft 365 admin centre.

Setting Up the Security Foundation

Every enforcement mechanism in this project traces back to three Entra security groups. Before any RLS rule, OneLake role, or workspace permission was configured, these groups had to exist and be consistently named — because every Fabric access control surface resolves membership through Entra at runtime, not through Fabric's own identity store.

Two accounts were provisioned in the Microsoft 365 Admin Center — Guy Merlin Fomen (gfomen@lanacloud.com) and Merlin Fomen (guyfomen@lanacloud.com), both with Microsoft Fabric (Free) licences:

Microsoft 365 Admin Center — two users: Guy Merlin Fomen (gfomen@lanacloud.com) and Merlin Fomen (guyfomen@lanacloud.com), both with Microsoft Fabric (Free) licence

M365 Admin Center — two accounts with Microsoft Fabric (Free) licences. All access control is enforced through Entra groups rather than individual user assignments, so direct user management inside Fabric is minimal.

In Microsoft Entra, three security groups were created with Cloud source and Assigned membership — meaning membership is explicit and auditable, not computed dynamically:

Azure Entra — Groups | All groups — AW-POC-Admins (b832279c...), AW-POC-Analysts (551f1b82...), AW-POC-Viewers (2591d9f1...) — all Security type, Assigned membership, Cloud source

Entra Groups — three security groups: AW-POC-Admins, AW-POC-Analysts, AW-POC-Viewers. The Object IDs (b832279c..., 551f1b82..., 2591d9f1...) are what Fabric resolves when checking access — keep group names and IDs stable across environments.

With groups in place, the Fabric workspace WS_LC-Dev was configured in Manage access to assign each group a workspace role. Why both data-access groups receive the same Viewer role — rather than mapping Analysts to a higher role — is the subject of the next section:

WS_LC-Dev Manage access — AW-POC-Admins: Admin, Merlin Fomen: Admin, AW-POC-Analysts: Viewer, AW-POC-Viewers: Viewer

WS_LC-Dev Manage access — AW-POC-Admins and Merlin Fomen as Admin; both AW-POC-Analysts and AW-POC-Viewers as Viewer. The access differential between the two groups is enforced inside the semantic model's RLS, not at the workspace role level.

The naming trap: workspace roles are not RLS roles

It is tempting to map job titles onto Fabric workspace roles — "Analyst" sounds more senior than "Viewer." In Fabric's security model, that intuition works against you. A workspace role (Admin / Member / Contributor / Viewer) governs whether someone can edit Fabric items. Anyone at Contributor or above can open a semantic model directly and query it freely — meaning Row-Level Security has nothing left to enforce against them, because per Microsoft's own documentation, anything above Viewer bypasses RLS entirely.

The fix: both the Analysts and Viewers security groups in this project received the Viewer workspace role. The actual difference in what each group sees lives one layer up, inside the semantic model's RLS rules: Analysts get full transaction detail filtered to their territory, Viewers get aggregated summaries only. Two access models, deliberately layered, both depending on getting the workspace role right first.

The RLS roles themselves are created in Manage security roles inside the semantic model's web editor in Fabric. Three roles — Admin, Analyst, and Viewer — were created here with DAX filter expressions appropriate to each:

Fabric Manage security roles — three roles listed: Admin, Analyst, Viewer. dim_customer selected. Rules panel empty (Admin has no row filter). Success banner: 'Successfully applied role changes.'

Manage security roles — Admin, Analyst, and Viewer created in the Fabric web model editor. The "Switch to DAX editor" button on the right is where territory-filter expressions are added per role. The success banner confirms role changes were saved.

OneLake security: closing the gap Power BI RLS leaves open

Power BI RLS only protects someone viewing a published report. It has zero effect on someone who connects directly to the Lakehouse via a notebook, the SQL analytics endpoint, or Excel. OneLake security closes that gap. From any Lakehouse item, "Manage OneLake security" lets you define custom roles scoped to specific tables — independent of anything built later in Power BI, and assignable to the same Entra security groups already governing the rest of the project. Two independent enforcement points protecting the same boundary: defense in depth, not a single point of failure.

The entry point is the Manage OneLake security button in the Lakehouse explorer's top ribbon:

Fabric Lakehouse explorer with 'Manage OneLake security' button highlighted in the top ribbon

Lakehouse explorer — "Manage OneLake security" in the top ribbon opens the role management panel. This is a Lakehouse-level control that operates independently of any downstream semantic model or report.

Clicking it opens the OneLake security panel. From there, + New launches a three-step wizard:

Step 1 — Define the role name and type. The role is named AWPOCAnalysts, type Grant, with Read permission. Read is on by default; ReadWrite is only added when the role needs write access to Lakehouse data:

New role wizard — Role step: name AWPOCAnalysts, type Grant, Read checked, ReadWrite unchecked

Role definition — Grant type with Read-only permissions. ReadWrite is explicitly unchecked — Analysts need to query the data, not modify it.

Step 2 — Select the data. Choosing "Selected data" rather than "All data" scopes the role to specific tables. For AWPOCAnalysts, only the five Sales schema tables are included — any other table in the Lakehouse is inaccessible to this role regardless of workspace assignment:

New role wizard — Data step: Selected data, dbo schema with Sales_Customer, Sales_SalesOrderDetail, Sales_SalesOrderHeader, Sales_SalesPerson, Sales_SalesTerritory — all Read

Data scope — five Sales tables assigned to this role. Tables not listed (HumanResources, Production, Purchasing) are inaccessible to role members at the OneLake layer — not just hidden from a report.

Step 3 — Add members. The Entra security group AW-POC-Analysts is added directly by email. Any future member added to the Entra group automatically inherits this OneLake role without any change to the Lakehouse configuration:

New role wizard — Member step: AW-POC-Analysts added as Group via Email

Member assignment — AW-POC-Analysts added at the group level. Membership changes in Entra propagate to OneLake access automatically, with no manual role update required.

After creation, the OneLake security panel shows both the custom role and the built-in DefaultReader:

OneLake security panel — two roles: AWPOCAnalysts (Grant, Read) and DefaultReader (Grant, Read)

OneLake security roles — AWPOCAnalysts (scoped to Sales tables) alongside the built-in DefaultReader. The custom role's table restriction enforces at the storage layer, independently of whatever Power BI RLS does downstream.

True Object-Level Security still requires leaving the browser

Fabric's web modeling experience now handles most semantic modeling tasks that used to require Tabular Editor — table selection, relationships, even static RLS roles. OLS is the one piece that still does not fully live there. The web UI's "Hide in report view" toggle looks like OLS, but it only removes a column from the report-author's field list — the data stays fully queryable by anyone with model access through a new report, Excel, or the XMLA endpoint directly.

Real OLS enforces the restriction at the security-role level, making a column genuinely inaccessible regardless of how someone tries to query it, and currently requires a tool connected via the workspace's XMLA endpoint. Tabular Editor (the free version is sufficient) is the standard choice.

One architectural detail worth knowing when working with Direct Lake: Tabular Editor and DAX Studio both advertise automatic connection when launched from Power BI Desktop's External Tools ribbon, but that handoff fails silently for live-connected Direct Lake models — there is no local database for the tool to find, because Desktop is not hosting a copy of anything; it is a viewer pointed at a model that physically lives in the Fabric service. The fix is connecting directly to the workspace's XMLA connection link (powerbi://api.powerbi.com/v1.0/myorg/{workspace}) in "Tabular Server" mode rather than relying on auto-detect. As more models move to Direct Lake, this is the practical skill that replaces the old "launch from Desktop and it just connects" workflow.

Once connected, Tabular Editor exposes the full Object Level Security configuration under each column's Translations, Perspectives, Security section in the properties panel. The example below shows dim_employee[BirthDate] — a column that should be inaccessible to Analysts but readable by Admins. The Analyst role is selected in the Object Level Security list and its value is set to None (restricted), while Admin stays Default (inherited) and Viewer also stays Default:

Tabular Editor 2.28.0 — dim_employee table, BirthDate column selected, Object Level Security: Admin=Default, Analyst=None, Viewer=Default

Tabular Editor — dim_employee[BirthDate] with Object Level Security: Analyst set to None (genuinely inaccessible), Admin and Viewer set to Default (inherited). This column will not appear for anyone in the Analyst role regardless of how they query the model — report, Excel, or XMLA endpoint.

Setting None on a column makes it genuinely inaccessible at the security-role level — not merely hidden from the field list. A member of the Analyst role querying the model directly through the XMLA endpoint will receive an error rather than a value, which is the behaviour the "Hide in report view" toggle does not produce.

Proving RLS works — with a real identity, not a placeholder

Power BI Desktop's "View as Roles" stays permanently grayed out for Direct Lake live connections, and there is no direct equivalent in the Fabric web UI. The actual feature lives in DAX Studio's Advanced → View As dialog, which supports impersonating a role, a user, or both.

The first real test failed in an instructive way: the mapping table had been seeded with invented placeholder UPNs, and DAX Studio rejected the impersonation outright — "the database doesn't exist, or you are not authorized to access it." Impersonating an identity for RLS evaluation requires that identity to actually resolve against Entra; a UPN string sitting in a Delta table is just text until Entra confirms a real account behind it. Switching to a real account mapped to the Northwest territory worked immediately and returned exactly one row — Northwest, and only Northwest. That is the difference between "the logic should work" and actual, observed proof.


Part 3 — Building the Executive Report

Before starting this part: The Gold Lakehouse (lh_adventureworks_gold) must be populated with all dimension and fact tables. The sm_adventureworks_gold Direct Lake semantic model must exist — if it does not, create it via New item → Semantic model in the Fabric workspace, selecting the Gold Lakehouse as the source. Security groups from Part 2 must be in place before you publish the report, because the RLS roles reference them.

The executive report has three pages — Executive Summary, Sales Detail, and Product Detail — all served from the same Direct Lake model. Build them in order; the drillthrough pages depend on the Executive Summary page's fields.

Step 1 — Build the Executive Summary page

The landing page contains four KPI cards and three chart visuals:

Executive Summary Report page — four KPI cards: Total Revenue 109.85M, Total Orders 31K, Avg Order Value 3.49K, Revenue YoY -41.44%. Line chart of revenue by month, horizontal bar by territory, donut by category.

Executive Summary — four KPI cards and three charts against the full AdventureWorks dataset (109.85M total revenue, 31K orders). The Certified badge is applied in Part 4 after the governance workflow.

KPI card measures:

Card DAX Format
Total Revenue Total Revenue = SUM(fact_sales[LineTotal]) $M
Total Orders Total Orders = DISTINCTCOUNT(fact_sales[SalesOrderID]) K
Avg Order Value Avg Order Value = DIVIDE([Total Revenue], [Total Orders]) $
Revenue YoY % Revenue YoY % = DIVIDE([Total Revenue] - CALCULATE([Total Revenue], SAMEPERIODLASTYEAR(dim_date[Date])), CALCULATE([Total Revenue], SAMEPERIODLASTYEAR(dim_date[Date]))) %

Chart visuals:

  • Total Revenue by Year and Month Name — Line chart. X-axis: dim_date[MonthName], Y-axis: [Total Revenue]. ⚠️ Set MonthName's Sort by Column to dim_date[MonthNumber] in the model and set the visual axis sort to MonthName Ascending — without both, the axis silently sorts by revenue value and produces a misleading trend line.
  • Total Revenue by TerritoryName — Horizontal bar. Y-axis: dim_territory[TerritoryName], X-axis: [Total Revenue].
  • Total Revenue by CategoryName — Donut chart. Legend: dim_product[CategoryName], Values: [Total Revenue].

Step 2 — Build the Sales Detail drillthrough page

The Sales Detail page is a drillthrough destination — users reach it by right-clicking any data point on the Executive Summary and selecting Drill through → Sales Detail:

Sales Detail drillthrough page — matrix visual with OrderDateKey, SalesOrderID, CustomerFullName hierarchy and Total Revenue column. Grand total 109,846,381.40.

Sales Detail page — matrix visual with OrderDateKey → SalesOrderID → CustomerFullName hierarchy. The grand total (109,846,381.40) must match the Executive Summary's Total Revenue exactly — if it doesn't, check the grain trap below.

Matrix visual configuration:

Setting Value
Rows dim_date[OrderDateKey]fact_sales[SalesOrderID]dim_customer[CustomerFullName]
Values [Total Revenue] measure
Drillthrough field dim_date[OrderDateKey] (add to the Drillthrough well in the Visualizations pane)

⚠️ Grain trap: If the matrix total is approximately 27× the expected revenue, you used fact_sales[TotalDue] instead of [Total Revenue]. TotalDue is stored at order-grain; fact_sales is at line-item-grain — summing TotalDue multiplies every order's total by its line-item count. Always use the [Total Revenue] measure (built on LineTotal) for any revenue aggregation.

Step 3 — Build the Product Detail drillthrough page

The Product Detail page shows revenue by individual product and subcategory, reachable by right-clicking any slice of the Executive Summary donut chart:

Product Detail drillthrough page — table with Name, SubcategoryName, and Total Revenue columns. Products listed alphabetically by subcategory, total 109,846,381.40.

Product Detail page — table visual with product Name, SubcategoryName, and Total Revenue, drillthrough-connected to CategoryName from the Executive Summary donut chart.

Table visual configuration:

Column Source
Name dim_product[Name]
SubcategoryName dim_product[SubcategoryName]
Total Revenue [Total Revenue] measure

Set the Drillthrough field to dim_product[CategoryName] so the page is reachable from the Executive Summary donut.

Step 4 — Publish and verify

Once all three pages are built in the Fabric browser editor:

  1. Save the report — it is already live in Fabric; there is no separate publish step for reports built natively in the service.
  2. Confirm the grand totals on Sales Detail and Product Detail match Executive Summary's [Total Revenue] (109,846,381.40 in this dataset).
  3. Verify access for AW-POC-Analysts members under Share → Manage permissions.
  4. Do not apply the Certified badge yet — that is Part 4.

Four things to watch for

1. Month sort order requires two separate settings. Set dim_date[MonthName]'s Sort by Column to dim_date[MonthNumber] at the model level in the semantic model's web editor. Then, in the visual's Format pane, set the X-axis sort to MonthName Ascending. Missing either step gives a trend line that looks plausible but is structurally wrong.

2. View as Roles is permanently grayed out for Direct Lake. Use DAX Studio's Advanced → View As for all RLS testing — covered in Part 2. This is not a bug you can work around in Power BI Desktop.

3. Stray page filters override drillthrough defaults silently. If a drillthrough page shows unexpected filtered data when navigated to directly, open the Filters pane → Filters on this page section and check for filters that do not belong there. They are not visible in the visual-level filter pane.

4. You cannot download a Direct Lake report as .pbix. This is a confirmed platform constraint — Direct Lake content lives in the service. Build, iterate, and publish entirely in the Fabric browser.


Part 4 — Certification, Lineage, and a Real Permissions Investigation

Before starting this part: The executive report must be published in Fabric (Part 3 complete). You need tenant admin access to the Fabric Admin portal to enable the Certification setting. The certifying account must be a direct member of the AW-POC-Admins Entra security group — workspace Admin role alone is not sufficient.

Closing the governance loop meant three more phases, each demonstrating a different facet of maturity rather than a checkbox.

Certification: the two-layer admin gate

Certification in Fabric is gated at two independent levels. First, a tenant admin must enable it — and the scope matters: enabling for "The entire organization" means any user can certify anything, which defeats the purpose. Scoping to a specific security group means only members of that group can apply the certification badge:

Fabric Admin portal — Certification setting: Enabled for specific security groups, AW-POC-Admins selected. Domain admins can enable/disable checked.

Admin portal Certification setting — enabled for AW-POC-Admins only. Without this, the "Certified" option simply does not appear in the endorsement UI regardless of workspace role.

Second, the certifying account must be a direct member of the authorized security group — not just a workspace Admin or Owner. Workspace role assignment does not imply security group membership. Once both conditions were met, the endorsement panel on the semantic model offered "Certified" as a selectable option:

Semantic model endorsement settings — Certified selected, Make discoverable checked. Warning: 'Others in your org will be able to find it by name, tables, columns, etc.'

Endorsement and discovery — Certified selected, Make discoverable checked. The discoverability warning is intentional: this dataset is meant to be findable by analysts across the tenant as the authoritative source.

With certification applied, the semantic model shows the Certified badge in the workspace's folder view — timestamped and attributed to the certifying account:

WS_LC-Dev folder 04-Semantic Model — sm_adventureworks_gold with Certified badge in Endorsement column, owner WS_LC-Dev, refreshed 6/21/2026 10:46

sm_adventureworks_gold — Certified badge visible in the workspace folder view. Git status shows Uncommitted, which is expected for a Direct Lake model before the Git export workaround described in Part 5.

The executive report was certified separately via its own settings panel — certification does not cascade from the semantic model to dependent reports automatically:

Executive Summary Report settings panel — Certified endorsement selected, Feature on Home checked, Contact: Merlin Fomen

Executive Summary Report settings — Certified + Feature on Home. "Feature on Home" surfaces the report in the Featured section on Power BI Home for users with access — a lightweight distribution step that requires no app publication.

Lineage View and Impact Analysis

With both items certified, the Lineage view in WS_LC-Dev confirmed the full chain from storage to consumer:

Fabric Lineage view — lh_adventureworks_gold → nb_gold_fact_sales → sm_adventureworks_gold (Certified badge) → Executive Summary Report (1 Viewers, 15 Views)

Lineage view — Gold Lakehouse → nb_gold_fact_sales notebook → sm_adventureworks_gold (Certified) → Executive Summary Report. The report shows 1 Viewer and 15 Views — confirming the RLS identity test from Part 2 generated real tracked usage, not just local preview activity.

An unplanned detour into Impact Analysis turned out to be the more genuinely useful artifact of the two. Clicking "Impacted by this Lakehouse" on lh_adventureworks_silver returned a precise, enumerated answer rather than requiring inference:

Impact Analysis panel for lh_adventureworks_silver — 7 impacted child items: 6 Notebooks (nb_silver_sales_order, nb_silver_product, nb_silver_territory, nb_silver_employee, nb_silver_address, nb_gold_fact_sales) and 1 SQL analytics endpoint, across 1 workspace

Impact Analysis — lh_adventureworks_silver: 7 impacted child items across 1 workspace. Six notebooks depend on it directly (including nb_gold_fact_sales which reads from Silver), plus the auto-provisioned SQL analytics endpoint. This is the answer to "what breaks if I touch this Lakehouse" without having to trace every dependency manually.

Asking "what breaks if I touch this Silver Lakehouse" returned 7 downstream items across 6 notebooks and 1 SQL analytics endpoint — a specific, actionable answer, not a guess. Lineage shows you the chain; Impact Analysis tells you the blast radius of a change before you make it.

Audit Logging: a Permissions Investigation

Microsoft Purview's Audit search is the authoritative log for Fabric and Power BI activity in the tenant. Getting to a working first search required resolving a non-obvious permissions split.

The initial search attempt failed identically across every date range and filter combination — a constant failure regardless of what was varied, which points at something structural rather than a bad input. The search UI looked correct; the results were simply absent:

Microsoft Purview Audit search — three searches completed: Jun 22-23 PowerBIAudit (0 results), Jun 22-23 (3 results), Jun 22-23 PowerBIAudit (0 results). Record type: PowerBIAudit.

Purview Audit search — three search attempts across the same Jun 22-23 window. The pattern (0 → 3 → 0 results) tells the story: the middle search was run after adding the Audit Reader role, the outer two show what the search returns without it.

The cause: holding broad admin role groups (Organization Management, Purview Administrators) does not include audit search rights in Microsoft's current permission model. Audit access is split into its own dedicated Audit Reader / Audit Manager role groups. The fix was adding the account directly to Audit Reader in Purview's Role groups settings:

Microsoft Purview Role groups — Audit Reader selected, Members tab showing Merlin Fomen (guyfomen@lanacloud.com, User, Organization)

Purview Role groups — Audit Reader, Members tab. Merlin Fomen added directly as a User member. This is a Purview-level permission, separate from Entra groups and Fabric workspace roles.

The full permissions picture after resolving the audit access issue is visible on the My permissions tab in Purview Role groups:

Microsoft Purview My permissions — role groups: Audit Reader (Mapped from Entra ID: No), Data Catalog Curators (Yes), Data Estate Insights Readers (Yes), Organization Management (Yes), Purview Administrators (Yes)

My permissions — five role groups in effect. The "Mapped from Entra ID: No" on Audit Reader is the diagnostic detail: it was added directly in Purview, not inherited through an Entra group. Organization Management and Purview Administrators both show Yes — but neither grants audit search rights, which is why the fix required Audit Reader specifically.

The first successful search still came back empty — audit recording only captures activity from the moment it is switched on. A second, correctly-windowed search returned real entries, including two attributed not to a person but to Power BI's own service-principal GUID refreshing its internal admin-monitoring models. Distinguishing "a real user action" from "normal platform housekeeping sharing the same audit pipeline" is itself the skill an audit log is meant to develop.


Part 5 — Git Integration: A Step-by-Step Setup and an Honest Platform Bug

Before starting this part: A GitHub account and an empty repository created (you will need to add at least one file to it before Fabric can connect — the setup flow will prompt you if you skip this). Your Fabric account must be a direct member of the AW-POC-Admins Entra security group, and you need Fabric tenant admin access to enable the Git integration settings in the Admin portal.

Fabric's native Git integration connects a workspace directly to a GitHub or Azure DevOps repository — enabling source control, CI/CD pipelines, and durable version history for notebooks, pipelines, and semantic models. Setting it up for this project required navigating a two-layer admin gate that is not well documented, a credential trap that produces a misleading error, an empty-repository blocker, and ultimately a genuine platform-side bug diagnosed through controlled isolation testing.

This section documents every step — including the failure modes — so you can reproduce it cleanly.

Phase 1 — Enable Git integration in the Admin portal (two separate toggles)

Toggle 1: the master switch

In the Fabric Admin portal → Tenant settings, search for "Git integration." The first setting — "Users can synchronize workspace items with their Git repositories" — is the master switch. Without it, the Git integration tab inside Workspace Settings shows nothing at all.

Admin portal — Git integration master toggle enabled for the entire organization

Master toggle — enable for the entire organisation or scope it to a specific security group. This controls all Git providers (Azure DevOps and GitHub).

Toggle 2: the GitHub-specific switch

Scroll down past the master toggle. A second, separate setting controls GitHub specifically: "Users can sync workspace items with GitHub repositories." This toggle only appears once the master switch is on. GitHub will not appear as a provider in Workspace Settings until both toggles are enabled — Azure DevOps only requires the first.

Admin portal — GitHub-specific toggle enabled for AW-POC-Admins security group, with multi-geo warning

GitHub-specific toggle — scoped to the AW-POC-Admins security group. Note the multi-geo warning: GitHub connections bypass regional data residency enforcement.

Two things worth noting on this screen. First, the security group AW-POC-Admins is used instead of "The entire organisation" — this limits Git integration to the accounts that actually need it, following least-privilege for tenant-level features. Second, the multi-geo warning is genuine: items connected to GitHub are not subject to Fabric's multi-geo data residency restrictions. If your organisation has compliance requirements about where data lives, this matters before you connect any workspace.

Wait for propagation

After clicking Apply on either setting, a notification confirms the change — but changes take up to 15 minutes to apply across the tenant.

Fabric notification: Tenant settings changes will be applied within the next 15 minutes

Do not test immediately after saving — wait the full 15 minutes or the Workspace Settings Git integration tab will appear as if neither switch was ever enabled.

Phase 2 — Workspace Settings: both providers now visible

After the propagation window, open Workspace Settings → Git integration. With both tenant toggles enabled, both Azure DevOps and GitHub appear as selectable providers.

Workspace Settings Git integration — Azure DevOps and GitHub both available as providers

Both providers visible — Azure DevOps (left) and GitHub (right). If only one toggle was enabled, only Azure DevOps would appear.

Phase 3 — Create a GitHub Personal Access Token (classic, not fine-grained)

This is the single most common point of failure. Fabric's Git integration requires a classic Personal Access Token (PAT). Fine-grained tokens — GitHub's newer, more permission-scoped token type — are rejected with a Git_GitProviderCredentialsNotAuthorizedError that gives no hint about which token type is the problem.

On GitHub, go to Settings → Developer settings → Personal access tokens → Tokens (classic):

GitHub Developer Settings — Personal access tokens, Tokens (classic) selected in the left nav

GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic). Do not use Fine-grained tokens — they will fail.

For reference, this is what the fine-grained token creation page looks like — the token named fabric-adventureworks-poc, scoped to the single repository GuyFomen/fabric-adventureworks-medallion-poc, with 90-day expiry:

GitHub fine-grained token creation — fabric-adventureworks-poc, 90 days, only select repositories

Fine-grained token — correctly scoped to one repository, but rejected by Fabric. The error message does not mention token type. Use Tokens (classic) instead.

Generate a classic token with the repo scope (full control of private repositories). Copy it immediately — GitHub will not show it again.

Phase 4 — Add the GitHub account in Fabric

Back in Workspace Settings → Git integration → GitHub, click Add account. The "There aren't any accounts" state is expected on first connection:

Fill in the Add GitHub account dialog:

Add GitHub account dialog — display name lanacloud-github, classic PAT, repository URL

Add GitHub account — display name: lanacloud-github, PAT: classic token, Repository URL: https://github.com/GuyFomen/fabric-adventureworks-medallion-poc

Field Value used
Display name lanacloud-github
Personal access token (classic PAT, repo scope)
Repository URL https://github.com/GuyFomen/fabric-adventureworks-medallion-poc

After clicking Add, the account appears in the provider panel and the Connect button becomes active:

Workspace Settings — lanacloud-github account added, Connect button active

Account lanacloud-github registered against the repository URL. Connect button is now active.

Phase 5 — Connect the repository and branch (empty repo blocker)

Click Connect. If the GitHub repository has no branches yet, Fabric shows a warning:

Connect Git repository and branch — Empty repository warning, no branches available

"Empty repository — make sure to create a branch first on the Git provider." The Branch dropdown is empty; Connect and sync remains greyed out.

Fix: Go to GitHub, create the main branch (or any branch) by committing any file — a README is enough. Return to Fabric and re-open the Branch dropdown.

After creating the branch in GitHub, the dropdown populates:

Branch dropdown showing New Branch option and main branch after creating it in GitHub

Branch dropdown after creating main in GitHub — both "New Branch" and "main" are now selectable. Choose main and proceed to Connect and sync.

Select main, leave the Git folder field empty (Fabric uses the repo root by default), and click Connect and sync.

Phase 6 — The platform bug: when the error is not yours to fix

With every configuration step correct — both admin toggles enabled, classic PAT, account registered, branch selected — the connect-and-sync step completed. The connection was established. But the subsequent commit action failed with Git_ItemNotFound.

Rather than keep guessing, a deliberate controlled isolation test:

  1. New workspace, same repo, new branch. Identical failure.
  2. Same throwaway workspace, new repo, new credential. Identical failure again.

Three independent variables — workspace, repository, credential — swapped one at a time, with the failure unchanged every time. What did stay constant across all three attempts was the backend cluster URI visible in the error detail. That is not a workspace setting or anything accessible from the Fabric UI — it is the regional backend cluster the tenant is assigned to.

Conclusion: this was not a misconfiguration. It is a fault in how that specific backend cluster handles Git item-lookup operations, sitting entirely server-side. On an enterprise tenant with a support contract, this is exactly the kind of finding that warrants a ticket with a clean three-step repro attached. On a personal trial tenant, there is no support channel — so the right move was to stop debugging a wall that is not yours to fix, document the root cause, and adopt a durable workaround:

  • Notebook code → exported as .ipynb files and pushed to GitHub manually
  • Semantic model → exported via Tabular Editor's TMDL export (connected via XMLA endpoint)
  • Pipelines → exported as JSON templates from the Data Factory UI

These exports became the permanent source-control path for this project — not a stopgap, but a deliberate decision once the root cause was confirmed as server-side.

The broader engineering lesson: when a cloud platform error persists identically across every client-side variable you can swap, that consistency is the diagnosis. A controlled, one-variable-at-a-time elimination test is the fastest way to distinguish "I configured something wrong" from "this is on the platform's side" — and which conclusion you reach determines what is actually worth your time next.


Part 5b — Deployment Pipelines: Promoting the Full Stack from Dev to UAT

Before starting this section: Both WS_LC-Dev and WS_LC-UAT workspaces must exist within the same Fabric capacity. You need workspace admin access on both. The full Dev workspace should be complete — all notebooks, pipelines, Lakehouses, semantic model, and report built and tested. A Fabric capacity admin must enable the Deployment Pipelines feature in the Admin portal if not already active.

A Fabric Deployment Pipeline is a structured promotion tool: it compares two workspace stages side by side (Development → UAT, or UAT → Production), shows exactly what is new or changed, and copies selected items from one stage to the next in a single click. Without this, promoting a medallion architecture across environments means manually recreating every notebook, pipeline, and semantic model — duplicating the exact kind of error-prone work this project was designed to avoid.

The pipeline for this project is dpl_adventureworks_medallion:

dpl_adventureworks_medallion deployment pipeline — Development stage (WS_LC-Dev) on the left with 16 items selected, UAT stage (WS_LC-UAT) on the right showing Deployed 6/28/2026 timestamp. Items visible include pl_full_medallion_run, nb_dq_utils, nb_gold_fact_sales, nb_silver_address, nb_silver_employee.

dpl_adventureworks_medallion — two-stage pipeline: Development (WS_LC-Dev) on the left, UAT (WS_LC-UAT) on the right. The stage arrow in the middle compares the two workspaces and shows what is new, modified, or identical between them.

Step 1 — Organise both workspaces with a consistent folder structure

Before deploying anything, both workspaces were given the same numbered folder structure. This is not enforced by Fabric — it is a deliberate convention that makes the inventory readable in both environments and prevents items from landing at the root level with no organisational context:

Fabric workspace navigator showing WS_LC-Dev and WS_LC-UAT side by side — both with identical folder structure: 01-Data, 02-Notebook, 03-Pipeline, 04-Semantic Model, 05-Report, 06-Monitoring Eventhouse and KQL, 07-My Activator, plus vl_adventureworks_lakehouses variable library in Dev.

Both workspaces share the same seven-folder structure. Items deployed from Dev land in the matching folder in UAT automatically. The vl_adventureworks_lakehouses variable library (visible in Dev) is the mechanism that tells notebooks which Lakehouse GUIDs to use per environment — see Step 2.

Step 2 — Create the variable library to handle environment-specific GUIDs

The single biggest challenge in promoting notebooks between environments is that Fabric Lakehouse IDs are GUIDs — globally unique identifiers that differ between Dev and UAT even when the Lakehouse has the same name. A notebook hardcoded to the Dev Bronze Lakehouse GUID will silently read from the wrong source in UAT.

The solution is a Fabric Variable Library (vl_adventureworks_lakehouses) — a workspace-level item that stores named variables with environment-specific value sets. Each notebook references LKH_bronze, LKH_silver, LKH_gold, and WorkspaceId by name; the active value set determines which GUID is actually resolved at runtime:

Variable library vl_adventureworks_lakehouses — four variables: LKH_bronze, LKH_silver, LKH_gold, WorkspaceId, all type Guid. Default value set (Active) shows Dev GUIDs. Alternative value set UAT shows the corresponding UAT Lakehouse and workspace GUIDs.

Variable library — four GUID variables, two value sets. When the active value set is switched from Default (Dev) to UAT, every notebook reading from this library resolves to the UAT Lakehouse IDs automatically — no notebook code changes required.

Variable Type Dev value UAT value
LKH_bronze Guid Dev Bronze Lakehouse ID UAT Bronze Lakehouse ID
LKH_silver Guid Dev Silver Lakehouse ID UAT Silver Lakehouse ID
LKH_gold Guid Dev Gold Lakehouse ID UAT Gold Lakehouse ID
WorkspaceId Guid Dev workspace ID UAT workspace ID

Step 3 — Know what cannot be deployed before you start

Fabric Deployment Pipelines support most item types, but not all. Clicking Deploy on the first attempt produced this warning:

Warning dialog: 'Workspace includes items that can't be deployed — Some items in this workspace are restricted or unsupported and won't be deployed to the next stage.' Listed: Monitoring_Eventstream, Monitoring Eventhouse, Monitoring KQL database.

Three Workspace Monitoring items are explicitly unsupported in deployment pipelines: Monitoring_Eventstream, Monitoring Eventhouse, and Monitoring KQL database. These must be configured manually in each environment. Clicking Continue proceeds with the deployable items and skips these three.

The three unsupported item types are all part of Workspace Monitoring (Part 8). They must be provisioned separately in UAT by enabling Workspace Monitoring in the UAT workspace settings — the same one-click activation used in Dev.

A second blocker appeared when the Data Activator item was included in the deployment selection:

Error dialog: 'Can't start the deployment — When deploying the items below, any related items must be included in the deployment or must already exist in the target folder. My activator listed.

'Can't start the deployment' — the Activator item has a dependency (the Monitoring Eventhouse it queries) that is itself not deployable. Since the dependency can't travel with it, the Activator can't be deployed either. Fix: deselect My Activator from the deployment and configure it manually in UAT after the Eventhouse is provisioned there.

Step 4 — Select items and deploy

With the Activator excluded and the Monitoring warning acknowledged, the deployment canvas shows 16 deployable items selected via Select related — which automatically includes all items that share dependencies with the manually selected ones:

Deployment canvas — WS_LC-Dev with 16 selected items. Checked: pl_bronze_ingest_adventureworks, pl_full_medallion_run, nb_dq_utils, nb_gold_fact_sales, nb_silver_address. Unchecked: My activator. UAT side shows orange spinner indicating in-progress comparison.

16 items selected — pipelines, all notebooks, Lakehouses, semantic model, and report. My Activator is explicitly deselected (no green check). The orange spinner on the UAT side indicates the pipeline is comparing Dev and UAT before enabling the Deploy button.

Clicking Deploy opens the confirmation dialog listing what will be created in UAT:

Deploy to next stage dialog — 'You are about to deploy from Development to UAT. New (17): KQLQuerySet_001, KustoQueryWorkbench_1, lh_adventureworks_bronze, lh_adventureworks_gold, lh_adventureworks_silver...' Continue deployment in case 1 or more items fail checkbox is checked.

Deploy to next stage — 17 New items going to UAT for the first time. 'Continue deployment in case 1 or more items fail' is checked — this ensures that a single item failure (e.g. a permissions issue on one notebook) does not block the rest of the deployment.

After deployment completes, the pipeline compares the two stages in real time:

Post-deployment view — Development (WS_LC-Dev) on left, UAT (WS_LC-UAT) on right showing Deployed 6/28/2026 12:59:40 PM. Centre arrow shows Comparing status. UAT side shows pl_full_medallion_run with hatched pattern indicating it is identical to Dev.

Post-deployment comparison — WS_LC-UAT timestamped 6/28/2026, 12:59:40 PM. The hatched pattern on items in UAT means they are now identical to their Dev counterparts. My Activator shows '+ New' in Dev because it was excluded and does not yet exist in UAT.

Step 5 — Configure deployment rules to remap environment references

Deploying copies item definitions exactly — which means any hardcoded Lakehouse reference in a notebook points to the Dev Lakehouse even after landing in UAT. Deployment rules are per-item overrides that rewrite specific references automatically during each deployment.

The Deployment rules panel (accessible from the UAT stage settings) lists every item that supports rules:

Deployment rules [UAT] panel — lists nb_dq_utils, nb_gold_fact_sales, nb_silver_address, nb_silver_employee, nb_silver_product, nb_silver_sales_order, nb_silver_territory (all Notebook type) and sm_adventureworks_gold (Semantic model type).

Deployment rules [UAT] — all seven notebooks and the semantic model support rules. Each item is configured separately. Rules run automatically on every subsequent deployment to this stage — you configure them once, not per-deploy.

For each notebook, the Additional lakehouses rule maps the Dev Lakehouse name to the UAT Lakehouse ID and workspace ID. Without this rule, the notebook in UAT still reads from the Dev Lakehouse:

nb_silver_sales_order Notebook deployment rules — Additional lakehouses: From lh_adventureworks_bronze (Dev) → To Other, Lakehouse id cc8c1494..., Lakehouse name lh_adventureworks_bronze, Lakehouse workspace id 0182df29...

nb_silver_sales_order deployment rule — the Dev Bronze Lakehouse is remapped to the UAT Bronze Lakehouse (different GUID, same name). This rule is applied automatically every time this notebook is deployed from Dev to UAT. Repeat this for every notebook that references a Lakehouse.

Step 6 — Handle the semantic model's OneLake URI

The semantic model presents a different challenge. The sm_adventureworks_gold model contains a Shared Expression named OneLakeURI — a parameterized M expression holding the full DFS path to the Dev Gold Lakehouse. This path contains the Dev workspace GUID, which is meaningless in UAT.

Viewing this expression in Tabular Editor (connected via the Dev XMLA endpoint) shows the hardcoded URI:

Tabular Editor — WS_LC-Dev.sm_adventureworks_gold Shared Expressions, OneLakeURI selected. Expression shows full onelake.dfs.fabric.microsoft.com URI with Dev workspace and Lakehouse GUIDs. IsParameterQuery=true, IsParameterQueryRequired=true, Type=Text.

OneLakeURI shared expression — a parameterized M expression embedding the Dev Lakehouse's full DFS path. Because it is marked IsParameterQuery=true, it is overridable via a deployment rule Parameter rule. The UAT value set in the variable library provides the correct UAT URI.

The sm_adventureworks_gold deployment rules panel shows where this override is configured — under Parameter rules, a rule maps OneLakeURI to the UAT Lakehouse URI on each deployment:

sm_adventureworks_gold semantic model deployment rules — Data source rules section (collapsed) and Parameter rules section showing 'No rules were added yet' with + Add rule button.

Semantic model Parameter rules — the + Add rule button is where you map the OneLakeURI parameter to the UAT Lakehouse DFS path. Once saved, every subsequent deployment automatically rewrites the URI, keeping the UAT model pointed at the correct UAT Lakehouse without manual edits in Tabular Editor.

To deploy the semantic model selectively (for example, after updating only the model definition without redeploying all notebooks), select it alone from the Dev canvas:

Deployment canvas — sm_adventureworks_gold selected (1 selected) in WS_LC-Dev. Items visible: nb_silver_employee, nb_silver_product, nb_silver_sales_order, nb_silver_territory, sm_adventureworks_gold (checked), Executive Summary Report (light check). UAT side shows matching items with +1 badge on the comparison arrow.

Selective deployment — only sm_adventureworks_gold selected. The Executive Summary Report is shown with a lighter check because it depends on the semantic model and will be included automatically. The +1 badge on the comparison arrow confirms exactly one item is new or changed relative to UAT.

Step 7 — Verify the UAT environment

After a successful deployment, the UAT Gold Lakehouse is accessible with the same structure as Dev — confirming that the medallion layers, tables, and endpoints reproduced correctly across the environment boundary:

UAT Gold Lakehouse (lh_adventureworks_gold) in Fabric — different workspace URL (UAT workspace GUID in the address bar). Fabric tabs show: Executive Summary Report, nb_dq_utils, dpl_adventureworks_medallion, sm_adventureworks_gold. Explorer shows Tables > dbo and Files. Update all variables button visible in toolbar.

UAT lh_adventureworks_gold — same structure as Dev, different workspace GUID in the URL. The 'Update all variables' button in the toolbar triggers the variable library to switch its active value set, updating all GUID references across notebooks simultaneously.

What the deployment pipeline does and does not handle

Item Deployed automatically Manual step required
Notebooks (all 7) Yes — code and dependencies copied Deployment rules to remap Lakehouse IDs
Pipelines (Bronze + orchestration) Yes Verify connection references point to UAT gateway
Lakehouses (Bronze, Silver, Gold) Yes — structure only, no data Data runs from the UAT pipeline schedule
Semantic model Yes Parameter rule for OneLakeURI; reconnect report
Executive Summary Report Yes Re-verify RLS roles reference UAT identities
Workspace Monitoring (Eventstream, Eventhouse, KQL) No — unsupported item type Enable Workspace Monitoring in UAT workspace settings
Data Activator No — dependency on undeployable Eventhouse Recreate manually in UAT after Eventhouse is provisioned
Security groups and workspace roles No Assign Entra groups to UAT workspace roles separately
On-premises gateway connection No Create conn_adventureworks_onprem in UAT pointing to same gateway

Part 5c — User Data Functions: Exposing Gold Data as a REST API

Before starting this section: The Gold Lakehouse must be populated and deployed to UAT (Part 2 and Part 5b complete). You need workspace member access on WS_LC-UAT. The User Data Functions feature must be enabled by a Fabric capacity admin in the Admin portal.

Everything built so far exposes data through the Power BI report or through notebooks triggered by a pipeline schedule. Fabric User Data Functions (UDF) add a third consumption channel: a serverless HTTP endpoint that any application, script, or API client can call directly — no Power BI license required, no Spark cluster to spin up. The function runs Python, queries the Gold Lakehouse via the connection alias you configure, and returns JSON. The result is a lightweight, authenticated REST API backed by your medallion Gold layer.

For this project, the item is udf_adventureworks_api:

Fabric code editor showing udf_adventureworks_api — User data functions item open in WS_LC-Dev. Functions explorer on the left shows hello_fabric. Code editor displays: import datetime, import fabric.functions as fn, import logging; udf = fn.UserDataFunctions(); @udf.function() decorator; def hello_fabric(name: str) -> str. Right panel shows Manage connections with empty alias/source/type/location table. Status bar: Test session active, Last published 06/29/2026 01:14:33 PM.

udf_adventureworks_api — the Fabric User Data Functions item. Language is Python. The @udf.function() decorator marks each Python function as an HTTP-invocable endpoint. The starter hello_fabric function is provided automatically; additional functions are added to the same item. The Manage connections panel on the right is where you wire up Lakehouse access before writing any query code.

Step 1 — Connect the function item to the Gold Lakehouse

User Data Functions access Fabric data sources through named connections configured in the Manage connections panel. Each connection gets an alias that the Python code uses — not a hardcoded Lakehouse path. This is the same environment-isolation principle as the variable library: the code references the alias, and the alias is wired to the appropriate Lakehouse per environment.

The Insert sample menu (Edit tab → Insert sample → Lakehouse) shows all the patterns the IDE can scaffold for you — including Query data from lakehouse tables, which generates the correct fabric.functions boilerplate:

Edit tab toolbar with Insert sample dropdown open — categories: Warehouse, API for GraphQL, Lakehouse (expanded: Read a parquet file from a lakehouse, Write csv file into a lakehouse, Read csv file from lakehouse, Query data from lakehouse tables), SQL Database, Variable Library, Data Manipulation, UDF Data Types, Cosmos DB, Business Events. Functions explorer shows hello_fabric and query_data_from_tables.

Insert sample → Lakehouse → Query data from lakehouse tables inserts a working boilerplate that reads the connection alias from the Manage connections panel. The inserted code is environment-portable — point the alias at a different Lakehouse and the function queries that Lakehouse without any code changes.

After using + Add connection to wire up the UAT Gold Lakehouse, the connection is registered with the alias GoldLakehouse:

Manage connections panel — udf_adventureworks_api — one connection: Alias GoldLakehouse, Source lh_adventurework..., Type Lakehouse, Location WS_LC-UAT. Plus Add connection button below.

Connection alias GoldLakehouse wired to lh_adventureworks_gold in WS_LC-UAT. The Python code references this alias by name — if you later point the alias at the Dev Lakehouse for local testing, no function code changes.

Step 2 — Write the API functions

Two functions are implemented in udf_adventureworks_api:

get_sales_summary — accepts a date range and returns total sales (LineTotal) and order count from fact_sales in the Gold layer:

import fabric.functions as fn
from datetime import datetime

udf = fn.UserDataFunctions()

@udf.function()
def get_sales_summary(startDate: str, endDate: str) -> dict:
    """
    Total sales (LineTotal) and order count from Gold Fact Sales for a date range.
    Args:
        startDate (str): "YYYYMMDD", e.g. "20220101"
        endDate (str): "YYYYMMDD", e.g. "20221231"
    Returns:
        dict with total_sales, line_count, startDate, end_date
    """
    lh = udf.get_lakehouse_connection("GoldLakehouse")
    df = lh.execute_query(
        f"""
        SELECT SUM(LineTotal) AS total_sales, COUNT(*) AS line_count
        FROM dbo.fact_sales
        WHERE OrderDateKey BETWEEN '{startDate}' AND '{endDate}'
        """
    )
    row = df.fetchone()
    return {
        "total_sales": float(row[0]) if row[0] else 0.0,
        "line_count":  int(row[1])   if row[1] else 0,
        "startDate":   startDate,
        "end_date":    endDate,
    }

get_gold_freshness — no parameters; returns the most recent OrderDateKey in the Gold table and the number of days since that date — a data freshness signal that a monitoring dashboard or alerting system can poll:

@udf.function()
def get_gold_freshness() -> dict:
    """
    Most recent OrderDateKey in Gold Fact Sales — a freshness signal.
    Returns:
        dict with last_order_date, days_since_last_order
    """
    lh = udf.get_lakehouse_connection("GoldLakehouse")
    df = lh.execute_query(
        "SELECT MAX(OrderDateKey) AS last_order_date FROM dbo.fact_sales"
    )
    row = df.fetchone()
    last_date_str = str(row[0]) if row[0] else None
    if last_date_str:
        last_date = datetime.strptime(last_date_str, "%Y%m%d")
        days_since = (datetime.utcnow() - last_date).days
    else:
        days_since = -1
    return {
        "last_order_date":       last_date_str,
        "days_since_last_order": days_since,
    }

Learner note — why get_gold_freshness matters: The Power BI report shows data as of the last pipeline run, but there is no visual indicator of data age on the report itself. A freshness endpoint gives any downstream system — a monitoring alert, a CI job, a chatbot — the ability to check whether the Gold layer is stale before surfacing data to end users. This is a pattern borrowed from API design ("health check / readiness endpoint") applied to data products.

Step 3 — Generate and use the invocation code

Once the functions are published, Fabric generates ready-to-run caller code. The Generate invocation code dialog (toolbar → Generate invocation code) produces a complete Python script for testing from a client application:

Generate invocation code dialog — Invoke from: Client application, Function name: get_sales_summary. Python code shown: from azure.identity import InteractiveBrowserCredential, import requests, import json. Token acquisition with scope https://analysis.windows.net/powerbi/api/user_impersonation. Comment: DO NOT USE IN PRODUCTION — for development purpose only to test the UDF endpoint.

Generate invocation code — the dialog produces a self-contained Python script. The InteractiveBrowserCredential flow is for local development testing only — it pops a browser window for interactive login. In production, replace this with a service principal using client credentials (app registration + client secret or managed identity).

The generated caller, ready to run from any Python environment with azure-identity and requests installed:

from azure.identity import InteractiveBrowserCredential
import requests
import json

# Development-only token acquisition — replace with service principal for production
app = InteractiveBrowserCredential()
scp = 'https://analysis.windows.net/powerbi/api/user_impersonation'
result = app.get_token(scp)

if not result.token:
    print('Error: Could not get access token')

headers = {
    'Authorization': f'Bearer {result.token}',
    'Content-Type': 'application/json'
}

endpoint = (
    'https://34c783dbd98d49be99820932cbb31c46.z34.userdatafunctions'
    '.fabric.microsoft.com/v1/workspaces/'
    '34c783db-d98d-49be-9982-0932cbb31c46/userDataFunctions/'
    'd7711db4-78e5-4d12-b768-3f884d5fd18a/functions/get_sales_summary/invoke'
)

request_body = {
    "startDate": "20220101",
    "endDate":   "20221231"
}

response = requests.post(endpoint, json=request_body, headers=headers)

if response.status_code == 200:
    print('Function invoked successfully!')
    print(json.dumps(response.json(), indent=4))
else:
    print('Status:', response.status_code)
    print('Response:', response.text)

A successful response returns a structured envelope:

{
    "functionName": "get_sales_summary",
    "invocationId": "a1b2c3d4-...",
    "status": "Succeeded",
    "output": {
        "total_sales": 109846302.87,
        "line_count": 121317,
        "startDate": "20220101",
        "end_date": "20221231"
    },
    "errors": []
}

Step 4 — Test with Postman using the OpenAPI specification

Fabric also generates a full OpenAPI 3.0 specification, ready to import directly into Postman or any API gateway:

Generate OpenAPI specification dialog — JSON/YAML toggle at top. JSON spec shown: openapi 3.0.0, info title Fabric User Data Functions version 0.1.0. Servers section shows the full UDF base URL. Paths section: /get_sales_summary/invoke post operation with description listing Args and Returns. /get_gold_freshness/invoke also listed.

OpenAPI specification — available as JSON or YAML. Import this file into Postman using File → Import → Raw text. The spec defines the base server URL, all exposed function paths (/get_sales_summary/invoke and /get_gold_freshness/invoke), request/response schemas with all error codes (400, 401, 403, 408, 409, 413, 500), and the bearerAuth security scheme.

To test in Postman:

  1. Copy the full OpenAPI JSON from the dialog (Copy button)
  2. In Postman: Import → Raw text → paste → Continue
  3. Add a Bearer Token Authorization header — obtain a token via Azure CLI: az login az account get-access-token --resource https://analysis.windows.net/powerbi/api --query accessToken -o tsv
  4. POST to /get_sales_summary/invoke with body {"startDate": "20220101", "endDate": "20221231"}
  5. POST to /get_gold_freshness/invoke with an empty body {}

What User Data Functions add to the architecture

Access pattern Tool Auth Use case
Interactive BI report Power BI / Fabric Reports Entra ID / RLS Business users, executives
Scheduled data movement Fabric Pipeline Managed identity ETL, downstream feeds
Programmatic API call User Data Function Bearer token (Entra) Apps, automations, external consumers
Ad hoc notebook query PySpark / SQL Analytics Entra ID Data engineers, analysts

User Data Functions are not a replacement for the report or the pipeline — they are the data product API layer for teams that want to consume Gold data without a Power BI license or a Fabric workspace. An external Python application, a Power Automate flow, or a custom web dashboard can all call the same authenticated endpoint and receive structured JSON from the same Gold Lakehouse that the report visualizes.


Part 6 — FinOps: Turning "This Notebook Feels Slow" Into a Measured Number

Before starting this part: At least one complete end-to-end pipeline run must have executed and completed so there is CU consumption data to measure. Install the Fabric Capacity Metrics app from AppSource — this is a Microsoft-published Power BI app that requires no Fabric admin rights to install, but the workspace must have been running for at least a few days to show meaningful 14-day trend data.

Step 1 — Connect the Fabric Capacity Metrics app

The Fabric Capacity Metrics app is a Microsoft-published Power BI report available from AppSource. Install it, connect it to your Fabric capacity, and it exposes a 14-day rolling view of CU consumption broken down by workspace, item type, and item name.

Two concepts to lock in before reading any numbers:

  • CU-seconds is the billing unit — the area under the compute curve over time, not a point-in-time reading
  • Session Duration ≠ CU-seconds. A notebook that finishes in 200 seconds can consume 4,000 CU-seconds if it runs a heavy Spark job. Duration tells you how long you waited; CU-seconds tells you what it cost.

The app has two particularly useful pages: Compute (the raw 14-day item table) and Item History (Preview) (per-item % breakdown, status distribution, and CU by date). Both are needed to get the full picture.

Step 2 — Identify the hotspots (Compute tab)

Opening the Compute tab and sorting by CU(s) descending immediately surfaced the problem:

Fabric Capacity Metrics Compute tab — 14-day view showing nb_gold_fact_sales and nb_silver_sales_order at the top

Compute tab, 14-day window — two notebooks dominate. Avg utilization 0.98%, peak 4.62% on an F64 SKU.

The data from this view:

Item Item kind CU (14 days) % of Total
nb_gold_fact_sales SynapseNotebook 114,459.97 31.8%
nb_silver_sales_order SynapseNotebook 106,865.19 29.7%
pl_bronze_ingest_adventureworks Pipeline 31,923.92 8.9%
nb_silver_address SynapseNotebook 26,347.06 7.3%
nb_silver_employee SynapseNotebook 22,229.43 6.2%
nb_silver_product SynapseNotebook 15,668.06 4.4%
nb_dq_utils SynapseNotebook 15,106.21 4.2%
nb_silver_territory SynapseNotebook 12,782.61 3.6%
Everything else Various 14,050.22 3.9%
Total 359,433.69 100%

Two notebooks — nb_gold_fact_sales and nb_silver_sales_order — consumed 61.6% of all CU across the entire workspace over 14 days. The other five Silver notebooks, the pipeline, and both Lakehouses shared the remaining 38.4%. Two numbers also stand out in the header row: avg utilization 0.98%, peak utilization 4.62% — on an F64 SKU that provides 64 CU. The entire workload's worst single moment consumed less than 3 CU.

Step 3 — Confirm with Item History (and find the real story)

The Item History (Preview) tab adds two things the Compute tab doesn't show: a per-item percentage breakdown, and the status distribution across all runs.

Fabric Capacity Metrics Item History — % compute by item, 99.81% success rate, CU by date chart

Item History — 99.81% success rate confirms the notebooks are completing correctly. The cost concentration in two items (31.84% + 30.29%) is the issue.

The % breakdown of status: 99.81% success, 0.09% failure, 0.08% stopped. The notebooks were completing correctly — this was not a reliability problem, it was an efficiency problem. Two items accounting for 62% of compute cost while running successfully is the definition of "optimize, don't debug."

The Item details panel confirms the concentration:

Item % Compute
nb_gold_fact_sales 31.84%
nb_silver_sales_order 30.29%
pl_bronze_ingest_adventureworks 8.67%
nb_silver_address 7.33%
nb_silver_employee 6.49%

The CU(s) by date chart (bottom right of Item History) shows the burst concentrating on Jun 18–20 — the intensive testing window — which also explains why the 14-day aggregate is dominated by those few days rather than being evenly distributed.

Step 4 — Right-size the capacity (before touching any code)

The capacity sizing question was already answerable before any code change. Looking at a single clean pipeline run in the Compute tab:

Fabric Capacity Metrics single-run day — avg 0.02%, peak 0.23% on F64

Single-run day — avg utilization 0.02%, peak 0.23% on F64. The pipeline dominates at 12,000 of 12,502 total CU for that day.

On a single clean run: avg 0.02%, peak 0.23% of F64 capacity. The pipeline activity alone consumed 12,000.97 of 12,502.75 CU that day (96%), with the Lakehouses and Warehouses splitting the remainder. Even across the most intensive 14-day window, peak utilization never exceeded 4.62% of F64.

The maths:

  • F64 = 64 CU; 4.62% peak = 2.96 CU actual peak draw
  • F8 = 8 CU; 2.96 CU peak = 37% of F8 capacity — comfortable headroom
  • Cost delta: (64 − 8) / 64 = 87.5% capacity-cost reduction

This is the right-sizing recommendation: F8, derived from measured peak draw, not from a capacity planning spreadsheet or a rule of thumb.

Step 5 — Diagnose the root cause

Spark uses lazy evaluation: every transformation (.filter(), .join(), .withColumn()) builds a logical plan but executes nothing. Only actions.count(), .collect(), .write() — actually trigger computation and bill CU-seconds.

That distinction is easy to acknowledge in theory and easy to forget in practice, especially in data-quality code where it feels natural to write a .count() after every check "just to confirm it looks right." The actual cost of that habit: a single dataframe in the Silver sales-order notebook was being recomputed from scratch up to eleven times across one DQ block, because nothing told Spark to cache the result between checks. Every null check, every duplicate-key check, every referential-integrity check was silently re-running the same multi-way join from zero.

One notable finding: the same anti-pattern appeared on a 10-row dimension table with zero joins — proof that the cost was never about data volume. It was about redundant Spark job-launch overhead, which the cluster bills for regardless of table size.

Step 6 — Apply the fix

The fix pattern, applied consistently across all five Silver notebooks and the shared DQ utility module:

# Cache once, immediately before any DQ checks begin
df_sales_order = df_sales_order.cache()

# One consolidated null check — not seven separate .count() calls
null_counts = df_sales_order.select([
    count(when(col(c).isNull(), c)).alias(c)
    for c in df_sales_order.columns
]).collect()[0].asDict()

log_dq_result("null_check", null_counts)

# Referential integrity — one action, reads from the cached frame
orphan_count = (
    df_sales_order
    .join(df_product, "ProductID", "left_anti")
    .count()
)
log_dq_result("referential_integrity_product", {"orphan_rows": orphan_count})

# Write to Silver — still reads from the cached frame
df_sales_order.write.format("delta").mode("overwrite").save(silver_path)

# Release memory after the write completes
df_sales_order.unpersist()

Four changes, applied systematically:

  1. Cache once immediately before the first DQ action — every subsequent action reads the materialized result
  2. Consolidate redundant checks — seven separate per-column null checks collapsed into a single aggregation pass
  3. Eliminate duplicate calls — multiple checks were computing the same value twice (once for pass/fail, once for the log message) with no caching between them
  4. Unpersist after the write — releases the cached frame from memory cleanly

Step 7 — Measure the result

After applying the fix and running the pipeline once cleanly, the per-run CU-seconds were read from the TimePoint Detail drill-down in the Capacity Metrics app:

Notebook Before (CU-sec/run) After (CU-sec/run) Reduction
nb_gold_fact_sales 18,795.62 4,885.04 74.0%
nb_silver_sales_order 19,242.14 9,761.57 49.3%
Combined 38,037.76 14,646.61 61.5%

Worth being precise about two figures that look like they should match and don't. The 61.6% from Step 2 is a 14-day blended attribution — it answers "where did cost concentrate across all runs." The 61.5% here is a clean, single-run, per-run-normalized before/after — it answers "did the fix work and by how much." Getting the second number right required ruling out three measurement traps one by one:

  1. Duration ≠ CU-seconds — do not use session duration as a proxy
  2. Cross-filter state — the capacity dashboard can be silently filtered to one item while still showing what looks like a workspace total; always check the filter pane
  3. Multi-run days — a single calendar day's CU total can bundle multiple pipeline runs; one "before" day had five runs compressed into one bar, which would have overstated the reduction by roughly 5× if used directly

Once each of those was ruled out, 61.5% is the number that holds up — and the 14-day aggregate view in the Capacity Metrics app confirms the overall workload profile shifted accordingly.


Part 7 — Orchestration: Wiring Five Notebooks Without Losing Control of Cost or Visibility

Before starting this part: All five Silver notebooks (nb_silver_sales_order, nb_silver_address, nb_silver_employee, nb_silver_product, nb_silver_territory) and the Gold notebook (nb_gold_fact_sales) must be individually tested and running without errors. The Bronze ingestion pipeline must be verified working. Complete the FinOps caching fix from Part 6 before building the orchestration — otherwise the pipeline will schedule the inefficient version.

With governance, reporting, and FinOps all in place independently, the next problem was tying the whole chain into one schedulable pipeline. The design: a fan-out from Bronze ingestion into five parallel Silver notebooks, converging into a single Gold fan-in, with Fabric's AND-logic dependency handling the convergence automatically once all five success arrows are wired in.

pl_full_medallion_run — Bronze, five parallel Silver notebooks, and Gold Fact Sales all succeeded

pl_full_medallion_run after a clean end-to-end pass — all six activities succeeded

The ForEach decision

Collapsing the five Silver notebooks into a single ForEach activity was considered and rejected. Fabric's "rerun from failed activity" reruns an entire ForEach loop — including iterations that already succeeded — with no native way to retry only the one that failed. The FinOps work above was specifically about eliminating wasted compute; burning CU-seconds re-running four notebooks that already passed just to retry the one that didn't would have undone exactly that discipline. Five separate activities cost more canvas space. They buy back selective reruns and per-notebook visibility in the monitor. For this project's priorities, that is the right trade-off.

Setting Up Workspace Identity and the Pipeline Connection

Before the pipeline activities can authenticate without user credentials, two infrastructure layers need to be in place: a tenant-level permission and a workspace-level identity. Setting these up is the prerequisite that the Connection field on Notebook and pipeline activities depends on.

Step 1 — Enable the tenant setting for service principals

In Admin portal → Tenant settings → Developer settings, enable "Service principals can call Fabric public APIs" and scope it to the AW-POC-Admins security group:

Admin portal — Service principals can call Fabric public APIs enabled for AW-POC-Admins security group

Developer settings — "Service principals can call Fabric public APIs" enabled for the AW-POC-Admins group. Without this, any Workspace Identity or service principal attempting to authenticate against Fabric APIs will be rejected at the tenant level regardless of workspace role.

This setting is the gate that determines whether service principals (including Workspace Identities) can interact with Fabric at all. It is separate from workspace roles and separate from Entra group membership — all three layers must be consistent.

Step 2 — Create the Workspace Identity

In Workspace Settings → Workspace identity, click + Workspace identity to create the managed identity for this workspace:

Workspace Settings — Create a workspace identity button before any identity exists

Before creation — the Workspace identity tab shows only the creation button. One click provisions the managed identity; no naming required as it takes the workspace name automatically.

Once created, the identity details page shows the provisioned service principal and its authorized users:

Workspace identity WS_LC-Dev — Active, authorized users: Merlin Fomen (Can edit members), AW-POC-Admins (Can edit members), AW-POC-Analysts (Can use identity), AW-POC-Viewers (Can use identity)

Workspace identity WS_LC-Dev — State: Active. Authorized users: Merlin Fomen and AW-POC-Admins with "Can edit members"; AW-POC-Analysts and AW-POC-Viewers with "Can use identity". All three security groups are present, giving each role the correct level of access to the identity.

The identity is named WS_LC-Dev (matching the workspace), with Object ID 0e581b29-5552-490a-8b00-b841b37f4b6d and App ID 6124d774-5674-4ba7-9004-8c5e74a40b1e. These IDs are what appear in Entra and in audit logs when the identity authenticates.

The permission model for authorized users is deliberately tiered:

Principal Permission What it means
Merlin Fomen Can edit members Can add/remove authorized users
AW-POC-Admins Can edit members Group-level admin over identity membership
AW-POC-Analysts Can use identity Can authenticate as this identity (e.g. for notebooks)
AW-POC-Viewers Can use identity Can authenticate as this identity (read-only context)

Step 3 — Add the service principal directly to the Entra security group

This is the step most documentation skips, and the one that caused the permissions error in this project. The Workspace Identity being listed in Workspace Settings is not sufficient for it to authenticate through a connection that itself is scoped to a security group. The service principal must be a direct member of the Entra security group that governs the permission boundary.

In Entra ID → Groups → AW-POC-Admins → Members:

AW-POC-Admins group members in Entra — AW-POC-Admins (nested group), Merlin Fomen (User), and WS_LC-Dev (Service principal) with Object ID 0e581b29...

AW-POC-Admins members in Entra — three entries: the group itself (nested), Merlin Fomen (user, guyfomen@lanacloud.com), and WS_LC-Dev (service principal, Object ID: 0e581b29...). The Object ID matches the Workspace Identity ID exactly.

Three members are visible: the group itself (a nested reference), Merlin Fomen as a direct user member, and WS_LC-Dev as a Service principal — the Workspace Identity, identifiable by its Object ID (0e581b29-5552-490a-8b00-b841b37f4b6d) which matches what the Workspace Settings panel shows. Without this entry, any pipeline activity running under the Workspace Identity that hits a permission boundary checked at the Entra group level will fail — even if the workspace role assignment looks correct from the Fabric UI.

Step 4 — Create the nb_connection for Fabric Data Pipelines

With the identity in place, create the connection object that pipeline activities can reference. In Manage connections and gateways → New connection → Cloud:

New connection — Cloud type, connection name nb_connection, Fabric Data Pipelines, Workspace identity authentication, Allow Code-First Artifacts checked, Privacy level: Organizational

New connection — Cloud type, Fabric Data Pipelines connector, Workspace identity authentication. "Allow Code-First Artifacts like Notebooks to access this connection (Preview)" checked — this is what makes the connection usable by Notebook activities. Privacy level: Organizational.

Key settings on this form: - Connection type: Cloud — not on-premises; Workspace Identity authenticates through Azure, not through a gateway - Connection name: nb_connection — used by Invoke Pipeline (Preview) activities - Authentication method: Workspace identity — the WS_LC-Dev managed identity authenticates the pipeline run - "Allow Code-First Artifacts like Notebooks to access this connection" (Preview) — must be checked for Notebook activities to reference this connection. This is the checkbox that makes the connection visible in the Notebook activity's Connection field dropdown

The yellow note on the form is worth reading: Workspace Identity authentication is currently supported for Dataflows Gen2 with CI/CD, Data pipelines, OneLake shortcuts, Notebooks, and Semantic models — the specific activity types that can use this connection.

Full connection inventory

After all connection objects are created across the project, the Manage Connections and Gateways page serves as a project-level infrastructure checklist:

Manage Connections and Gateways — showing all 6 project connections: CapacityMetricsCES, conn_adventureworks_onprem, lanacloud-github, MetricsDataConnector, nb_connection, test_fabric_repo

Manage Connections and Gateways — the full connection inventory for this project. Every external integration is visible in one place with its type, status, gateway (where applicable), and last credentials used timestamp.

Connection Type Purpose
CapacityMetricsCES Capacity Metrics Powers the Fabric Capacity Metrics app (Part 6)
conn_adventureworks_onprem SQL Server On-premises AdventureWorks source via lc_on_prem_gateway
lanacloud-github GitHub - Source control Git integration for the workspace (Part 5)
MetricsDataConnector MetricsDataConnector Metrics telemetry connection
nb_connection Fabric Data Pipelines Workspace Identity auth for pipeline/notebook activities
test_fabric_repo GitHub - Source control Test repository used during Git integration debugging

All six show green (healthy) status. The conn_adventureworks_onprem entry confirms the on-premises gateway (lc_on_prem_gateway) is active and credentials were last used on 6/24/26 at 10:43 AM — evidence of a successful pipeline run.

Two bugs that look similar and aren't

Wiring the Bronze step into the orchestration pipeline surfaced two distinct platform issues:

First: the Notebook activity's Connection field — meant to reference nb_connection and authenticate via Workspace Identity — stayed empty in the dropdown regardless of configuration, a confirmed platform rollout gap. The nb_connection object, the Workspace Identity, and the Entra group membership are all correctly in place (documented above); the activity's field simply does not yet populate from them on this backend cluster. It is an optional field, so all six Notebook activities run with it blank for now — the infrastructure will be used automatically once the rollout reaches this workspace.

Second: the Invoke Pipeline (Legacy) activity cannot show the child pipeline's progress while running. Switching to Invoke Pipeline (Preview) fixed the visibility issue, but its Connection field is required — a different rule than the Notebook activity, on the same general feature. Using nb_connection here worked, but surfaced a permissions error: the Workspace Identity was listed in Workspace Settings as an authorized user, yet the pipeline still failed to authenticate. The root cause was the missing Entra group membership shown in secur_indentity_4.png — the service principal must be a direct member of the Entra group, not merely referenced via Workspace Settings. Adding WS_LC-Dev directly to AW-POC-Admins in Entra (Step 3 above) resolved it. Same shape of problem as the certification work in Part 4: indirect membership is not equivalent to direct access, and Fabric checks at the Entra layer.

With both resolved, the pipeline runs on a fixed weekly schedule — Sunday at midnight Eastern — closing the loop from "things I run manually" to "things that run themselves."


Part 8 — Closing the Loop: Observability and Alerting

Before starting this part: The orchestration pipeline pl_full_medallion_run must be created, scheduled, and have completed at least one run (Part 7 complete). Workspace admin access is required to enable Workspace Monitoring. A Data Activator item will be created in this part — it requires no additional licences on an F-SKU capacity.

A pipeline that runs on a schedule with nobody watching it isn't finished — it just fails quietly instead of loudly. The last piece was making failures visible without babysitting the workspace.

Workspace Monitoring + KQL

Step 1 — Enable Workspace Monitoring and understand what it provisions

In Workspace Settings → Workspace Monitoring, flip the toggle on. Fabric automatically provisions:

  • A Monitoring Eventhouse — a real-time analytics database backed by Kusto/KQL
  • A Monitoring KQL database inside the Eventhouse — pre-populated with a rich table catalog
  • A Monitoring Eventstream — built on a native FabricMonitoring source connector that continuously streams workspace telemetry into the Eventhouse

The Eventstream architecture is worth understanding before you write a single KQL query:

Monitoring_Eventstream — FabricMonitoring source, Monitoring_Eventstream stream, FabricMonitoringDestination, with JSON data preview showing raw telemetry records

The Eventstream architecture: FabricMonitoring (built-in source, Active) → Monitoring_Eventstream → FabricMonitoringDestination (Active). The Data preview tab shows raw JSON telemetry records — cluster FQDN, node name, timestamp, and Azure resource ID — flowing continuously.

The Data insights tab shows the live message throughput — IncomingMessages vs OutgoingMessages over the last hour, spiking during pipeline run windows:

Monitoring_Eventstream Data insights — incoming and outgoing message count chart over the last hour

Data insights — message count spikes correspond to pipeline activity windows. The stream is active and routing telemetry from the FabricMonitoring source to the Eventhouse destination continuously.

Step 2 — Check the Eventhouse System Overview before writing queries

Before querying the data, open the Monitoring Eventhouse → System overview. It gives you a health dashboard for the monitoring infrastructure itself:

Monitoring Eventhouse System overview — storage 752.1 KB compressed from 3.2 MB, 84 minutes activity, 2K ingested rows, Advisor finding about Data Activator query spike

System overview — 3.2 MB of raw telemetry compressed to 752.1 KB (76% compression). Activity breakdown: KQL QuerySet, Data Activator, and System processes. Advisor finding: Data Activator is driving ~54% of total query load — expected, since the Activator polls every hour.

Several details from this screen are worth noting:

  • Compression ratio: 76% — Eventhouse's columnar storage compresses telemetry data aggressively. 3.2 MB of raw JSON events lands as 752.1 KB on disk. This is not unusual for structured log data with many repeated field names and values.
  • Always-on: Off — the Eventhouse pauses when idle, releasing capacity CU. For a monitoring workload driven by occasional queries and hourly Activator polls, this is the correct setting. Turning it on would keep the cluster warm 24/7 at unnecessary cost.
  • Advisor finding: Query spike — the advisor flagged that Data Activator is driving unusually high query activity, contributing ~54% of total usage. This is expected behaviour — the Activator rule polls the KQL database every hour. It is not a problem, but knowing the Eventhouse advisor surfaces this automatically means you can use it to catch genuinely unexpected query patterns in production.
  • Activity breakdown shows four consumers: KQL QuerySet (your manual queries), Data Activator (the alerting rule), and two system processes (Storage Cleaning, Shard Management). All are visible and attributable.

Step 3 — Understand the full table catalog before narrowing to ItemJobEventLogs

The Monitoring KQL database contains far more than the one table mentioned in most tutorials:

Monitoring KQL database overview — full table catalog including ItemJobEventLogs, SemanticModelLogs, EventhouseMetrics, GraphQLLogs, and more. Data Activity Tracker showing 3.2K rows ingested and 235 queries over 3 days.

Monitoring KQL database — the full table catalog (left panel) includes pipeline logs, Eventhouse self-monitoring, semantic model query logs, and more. Data Activity Tracker: 3.2K ingestion rows and 235 queries over 3 days. SemanticModelLogs: 1.49K rows ingested today.

The full table catalog visible in the left panel:

Table What it captures
ItemJobEventLogs Pipeline and notebook job start/end events — the table used in this blog
SemanticModelLogs Power BI semantic model query activity — who queried what, when, how long
EventhouseMetrics Eventhouse cluster-level performance metrics
EventhouseQueryLogs Every KQL query run against the Eventhouse
EventhouseCommandLogs Schema changes and admin commands
EventhouseIngestionResults Ingestion success/failure per batch
GraphQLLogs / GraphQLMetrics Activity on any GraphQL endpoints in the workspace
CopyJobActivityRunDetails Copy activity-level details for Data Factory pipelines
MirroredDatabaseLogs Mirrored database sync events
RawLogs Raw platform telemetry (lower-level than ItemJobEventLogs)

This catalog is the full monitoring surface of a Fabric workspace — pipelines, notebooks, semantic models, Eventhouse internals, and GraphQL endpoints all in one queryable database. ItemJobEventLogs is the entry point for pipeline monitoring, but SemanticModelLogs is equally valuable once the executive report is in production and you want to understand query performance and user adoption.

Step 4 — Explore with the Stats tab before building production queries

Before using a KQL query in Activator or saving it to a dashboard, validate it with the Stats tab in the KQL Queryset. It shows execution time, memory, CPU, and cache behaviour — the same metrics that determine whether a query is appropriate for high-frequency polling:

KQL Queryset Stats tab — 7.9ms execution time, 100% disk cache hits, 25.52 MB memory, 3 rows returned

Stats tab — execution time: 7.9ms. Disk cache: 100% hits (entire green bar, no misses). Memory: 25.52 MB peak. 3 rows, 275 bytes returned. This query is well within the bounds of safe hourly polling.

The query running here is a simple pipeline log exploration:

ItemJobEventLogs
| where Timestamp > ago(7d)
| where JobType == "Pipeline"
| project Timestamp, ItemName, WorkspaceName, JobStartTime, JobEndTime, JobStatus
| order by Timestamp desc

Execution time: 7.9 milliseconds. Disk cache: 100% hits — the monitoring data is hot in Eventhouse memory, meaning repeated queries by Activator or interactive analysts do not re-read from disk. Memory footprint: 25.52 MB. This is the performance profile of a database designed for fast, repeated operational queries — not batch analytics.

Step 5 — The ItemJobEventLogs schema gotcha

The schema has one critical detail to understand before writing anything more complex than a basic filter: ItemJobEventLogs logs one row per status transition — NotStarted → InProgress → Completed (or Failed) — not one row per run. Only the terminal row has both a JobStartTime and JobEndTime populated. Any aggregation that doesn't filter on isnotempty(JobEndTime) silently multiplies every run count by 3.

A reliable first query to understand what the table contains:

ItemJobEventLogs
| where Timestamp > ago(7d)
| where JobType == "Pipeline"
| project Timestamp, ItemName, WorkspaceName, JobStartTime, JobEndTime, JobStatus
| order by Timestamp desc

This returns only rows where a pipeline event occurred in the last 7 days, projected to the columns you actually need. From here you can add the isnotempty(JobEndTime) filter to isolate terminal rows before any duration or attribution calculations.

Step 6 — Cross-join attribution: parent and child pipelines

With the schema understood, a more sophisticated query correlates the parent pipeline's run window against the Bronze ingestion child pipeline's run window — since the Workspace Monitoring schema has no shared run ID linking parent to child:

KQL cross-join query correlating pl_full_medallion_run with pl_bronze_ingest_adventureworks by time containment — result: BronzeShareOfRunPct 50.5

Cross-join query result — ParentDurationSec: 927s, BronzeDurationSec: 468s, BronzeShareOfRunPct: 50.5. Done in 0.201s. Bronze Ingestion consumed exactly half the end-to-end wall-clock time.

That detail unlocked a cross-join correlating the parent pipeline's run window against its invoked child's run window by time containment — since there is no shared run ID linking them in this schema:

let parentRuns =
    ItemJobEventLogs
    | where ItemName == "pl_full_medallion_run" and JobType == "Pipeline"
    | where isnotempty(JobEndTime)
    | project ParentStart = JobStartTime, ParentEnd = JobEndTime, ParentStatus = JobStatus
    | extend JoinKey = 1;
let childRuns =
    ItemJobEventLogs
    | where ItemName == "pl_bronze_ingest_adventureworks" and JobType == "Pipeline"
    | where isnotempty(JobEndTime)
    | project ChildStart = JobStartTime, ChildEnd = JobEndTime, ChildStatus = JobStatus
    | extend JoinKey = 1;
parentRuns
| join kind=inner childRuns on JoinKey
| where ChildStart between (ParentStart .. ParentEnd)
| extend BronzeDurationSec    = datetime_diff('second', ChildEnd, ChildStart)
| extend ParentDurationSec    = datetime_diff('second', ParentEnd, ParentStart)
| extend BronzeShareOfRunPct  = round(100.0 * BronzeDurationSec / ParentDurationSec, 1)
| project ParentStart, ParentEnd, ParentStatus,
          ChildStart, ChildEnd, ChildStatus,
          BronzeDurationSec, ParentDurationSec, BronzeShareOfRunPct
| order by ParentStart desc
KQL cross-join result showing Bronze Ingestion accounts for 50.5% of pipeline runtime

Result: Bronze Ingestion accounts for 50.5% of the full orchestrated pipeline's total runtime

Against a real run, that returned a genuine, specific number: Bronze Ingestion accounts for 50.5% of the full orchestrated pipeline's runtime — half the total wall-clock time spent on ingestion alone, pulled from real log data rather than a generic demo statistic.

Activator alerting: a heartbeat pattern, not a failure-event pattern

With monitoring in place, the last piece was automatic failure notification. The approach here is worth explaining carefully, because it is architecturally different from what most tutorials show.

The naive approach to pipeline alerting is: watch for a failure event and send a notification when one appears. The pattern used here is the opposite — watch for a success heartbeat, and alert when that heartbeat goes silent. This is sometimes called a dead man's switch. It is more robust in practice because it catches failures that produce no event at all (hung processes, capacity throttling, silent timeouts) — not just failures that emit a clean error record.

Step 1 — Connect the event source

In My Activator, create a new event source of type KQL Queryset and point it at the Monitoring KQL database provisioned by Workspace Monitoring. Set the poll interval to 1 hour — matching the pipeline's weekly cadence with sufficient granularity to detect same-day issues.

The query powering the event source feeds Activator a stream of completed pipeline records:

ItemJobEventLogs
| extend SecondsAgo = datetime_diff('second', now(), ingestion_time())
| where JobType == "Pipeline" and JobStatus == "Completed"
| order by Timestamp desc
| project Timestamp, ItemName, WorkspaceName, JobStartTime, JobEndTime, JobStatus

This query filters for JobStatus == "Completed" — so every row Activator receives represents a successful pipeline completion. That successful completion is the heartbeat.

Activator Manage source tab showing the KQL query and 1-hour poll interval

Manage source — the KQL queryset polls ItemJobEventLogs every hour, projecting completed pipeline records as the heartbeat stream

Once connected, the Live feed tab confirms the event stream is receiving rows. Both pl_full_medallion_run and pl_bronze_ingest_adventureworks appear in the ItemName column — meaning Activator is tracking both the parent orchestration pipeline and the Bronze ingestion child pipeline from the same event source without any additional configuration.

Step 2 — Define the rule

Create a new rule on the event source. The critical choice is the condition type. Do not use "On every value" — that fires an alert every time a pipeline succeeds, which is noise, not signal.

The correct condition is "No presence of data" with a time elapsed of 10 minutes. In plain terms: if no completed-pipeline heartbeat arrives within a 10-minute window, the rule fires. Silence is the failure signal.

Activator rule live feed showing the complete rule definition with No presence of data condition

Rule definition — condition: "No presence of data / 10 minutes". Action: Email. The activation details table confirms real pipeline run timestamps are flowing through the event source.

The Definition panel (right side) summarises the complete rule:

Setting Value
Monitor Monitoring Eventhouse event
Condition No presence of data
Time elapsed 10 minutes
Action Email
Subject Activator alert pl_full_medallion_run alert
Headline The condition for 'pl_full_medallion_run alert' has been met

The Definition tab also labels the condition as "No heartbeat" — the exact mental model. As long as the pipeline completes on schedule, the heartbeat arrives and the rule stays quiet. The moment the pipeline fails, stalls, or never starts, the heartbeat stops and the alert fires within 10 minutes.

Step 3 — Verify in History

The History tab is the ground truth for whether the rule is working. Do not trust "Running" status alone — verify actual activations.

Activator rule History tab showing 19 total activations with activation count bar chart

History — 19 total activations confirmed today, clustered around 15:00, 16:00, and 18:00 UTC — correlating exactly with the pipeline run windows visible in the KQL event feed

The History tab showed 19 total activations (all this week, none the prior week — meaning the rule was new and the count is clean, not inherited from a prior configuration). The activation bar chart shows three distinct clusters corresponding to the three pipeline run windows executed during testing on June 26. Each cluster maps directly to a period where the event source received a burst of completed-pipeline records, confirming the rule is firing on real activity — not on background platform noise or test presses.


What This Build Demonstrates

This project covers the full production surface of Microsoft Fabric well beyond the quickstart happy path:

Capability What Was Demonstrated
Ingestion On-prem SQL Server → Bronze via gateway + scheduled pipeline
Transformation Medallion Bronze → Silver → Gold in PySpark with a shared DQ framework
Modeling Direct Lake semantic model, window functions, continuous calendar, fan-out resolution
Governance RLS + OLS enforced, OneLake security, certification workflow, audit logging
FinOps 61.5% compute reduction, F8 right-sizing recommendation from measured peak draw
Orchestration Fan-out / fan-in pipeline, selective-rerun-aware design, weekly schedule
Observability Workspace Monitoring KQL, cross-join attribution query, Activator email alerting
Platform debugging Backend bug isolated by controlled single-variable elimination test

The more important point is the method behind each one: every number is sourced, every security claim is tested against a real identity, every architectural decision has a documented reason, and every platform failure has a root cause — not just a workaround.

LanaCloud Assistant
Online · usually replies instantly