Microsoft Fabric Data Storage Options: A Comprehensive Deep Dive

Microsoft often defines Fabric as a ‘unified analytics platform’ that brings together data engineering, to storing and finally to build analyses for business consumption. The technology that enables this ‘unification’ is called OneLake which is an integrated, enterprise wide data lake that stores data in ‘open formats’ enabling collaboration across various tools necessary in the modern world, eliminating the ‘data silos’.

In this article I will go over all the data storing options that Fabric has, covering what each one is, why and when to use it (and when not to) and use cases with examples. The goal, while writing this article, is to provide a single place to answer all your questions around the many data storing tools/solutions Fabric has to offer.

Lakehouse

What is it?

Think of Lakehouse as a data lake meets data warehouse for storing both structured and unstructured data in one location. A Lakehouse combines the scalability of a data lake (handling high volume files and big data) with data warehouse type management and analytics features.

Why use it?

Lakehouses shine in big data and machine learning scenarios involving large volumes of unstructured or semi-structured data (e.g. JSON, CSV, images, Parquet files) and advanced analytics. They are ideal when you need to apply data engineering or data science on raw data while also allowing for SQL based analysis. This means you can have a data engineer bring in terabytes of data using PySpark using notebooks while also allowing analysts to use SQL queries on the curated data.
For instance, a content assessment team might use a Lakehouse to ingest raw data (JSON files or parquet datasets), refine and aggregate that data in Spark, and then allow business analysts to run Power BI reports directly on the resulting Delta tables through the SQL endpoint. In summary, choose a Lakehouse when you need the flexibility of a data lake for big data with the option of integrated SQL analytics on the results

How to use it?

Lakehouse Explorer - a web interface to upload and manage data, apply MIP labels and explore data using a built in viewer
Spark Notebooks - you can use notebooks (in Python, Spark SQL, etc.) to read and write data to the Lakehouse, perform ETL (extract transform load) operations, and train machine learning models.
Data Pipelines & Dataflows - use Fabric Data Factory pipelines (with Copy Activities) or Dataflows Gen2 to ingest data from various sources into the Lakehouse. These tools provide no-code or low-code methods for moving and transforming data.
SQL Interface - each Lakehouse provides a SQL Analytics Endpoint using which you can query all tables in the lakehouse. This allows you to query your Lakehouse data with T-SQL, create views or stored procedures on top of it, and even join it with other Fabric SQL-based sources, like warehouses. The Lakehouse’s SQL endpoint comes with a Power BI semantic model for easy reporting, so analysts can connect with Power BI or Excel and start analyzing data immediately.

Limitations

SQL endpoint is read only which means you cannot perform Insert/Update/Delete commands. Also, it only supports Delta format tables and files in other formats like CSV, must first be converted to Delta.
Lakehouse are designed to be more append-heavy. SO, while you can update or delete records in a Lakehouse, they will be not be as efficient as a traditional row based database.
Storage in a Lakehouse is file based so a very large number of small files can impact performance. Hence, you will need to optimize it periodically by combining smaller files into larger ones or following medallion architectural concept of bronze/silver/gold layers.

Best suited for

Data engineering, data science, and exploratory analytics.
Large scale ETL pipelines
Machine learning workloads
As a data lake

Warehouse

What is it?

Warehouse is a relational data warehouse built on OneLake and stores data as Delta Parquet files (same as the Lakehouse) while providing the user experience of a traditional SQL data warehouse. A Fabric Warehouse supports T-SQL functionality including ACID transactions, stored procedures, user-defined function and views. In brief, you get fast analytics like a traditional enterprise data warehouse, but your data is not locked in a closed or proprietary system.

Why use it?

Use a Fabric Warehouse when you need high performance analytics. So, you can build enterprise level BI solutions with star or snowflake schemas and reports for business intelligence. Because it provides full T-SQL support, the Warehouse is perfect for teams with SQL expertise or existing SQL data warehouse/BI solutions, as they can migrate with minimal rework.

How to use it?

The warehouse can be populated via multiple methods: T-SQL (copy into commands), Data Factory pipelines and Dataflows Gen2 (for ETL), or even Spark jobs that write directly to the warehouse tables. Warehouse can also be managed through external tools similar to Azure SQL or on-prem SQL server databases using SSMS or VS Code.
The Warehouse integrates with other Fabric items. Its data is in OneLake, so a data engineer could, for example, read or write to warehouse tables using a Spark notebook. The Warehouse also has a built-in SQL analytics endpoint (essentially the warehouse itself) that supports cross-database queries, so you can join Warehouse tables with tables from other warehouses or Lakehouse Delta tables in the same workspace using three-part names (e.g. Database.Schema.Table). This fosters a virtually unified data warehouse experience across different Fabric item.

Limitations

Because the Warehouse enforces relational schemas and uses Delta files, it is best suited for structured or semi-structured data.
Warehouse is a managed service which means that while you don't have to worry about the tuning the underlying hardware, you are limited to Fabric's capacity metrics and performance tiers.
Warehouse is optimized for OLAP rather than OLTP. OLAP stands for Online Analytical Processing, and it’s all about complex queries and data analysis for business intelligence. It’s optimized for read-heavy operations. On the other hand, OLTP stands for Online Transaction Processing, which focuses on handling a large number of simple, fast transactions, like in banking or retail systems. So, OLAP is for analysis, OLTP is for transactions. This means for heavy transactional workloads with thousands of small updates per second, consider using the Fabric SQL Database or Azure SQL DB, and reserve the Warehouse for downstream analytics.

Best suited for

Enterprise BI and reporting on large, clean, structured datasets because it excels at managing dimensional models (think Star schema as an example).

Eventhouse

What is it?

An Eventhouse is a solution for real-time analytical data based on the Kusto engine, the same technology behind Azure Data Explorer. It is optimized for ingesting and querying large volumes of streaming data such as logs, IoT feeds, etc. Each KQL (Kusto Query Language) database inside an Eventhouse can ingest structured, semi-structured (JSON, text), or unstructured data, and is automatically indexed and partitioned by time. The Eventhouse also monitors ingestion rates, query throughput, and storage usage.

Why use it?

Choose an Eventhouse for scenarios requiring real-time or near-real-time insights from high-volume data streams. For instance, if you need to analyze log files, telemetry and sensor data, application clickstreams, or security events in real time, an Eventhouse is purpose-built for this task. It offers extremely fast ingestion and querying via Kusto’s powerful indexing and query engine.
A Microsoft example scenario describes Daisy, a business analyst dealing with supply chain data comprising billions of rows, coming from IoT sensors and logs.

How to use it?

Data ingestion: You can ingest streaming data into an Eventhouse database from various sources for example, from IoT Hub or Azure Event Hubs directly into an Eventhouse. You can also use data pipelines, SDKs, REST API, or connectors for Spark, Kafka, etc., to feed data continuously.
Querying: Data in an Eventhouse is queried using Kusto Query Language (KQL) – a SQL-like query language optimized for analytics. You can also use Power BI’s DirectQuery to run KQL against the Eventhouse.

Limitations

Because Eventhouse is designed to be used for streaming data, data is typically appended as streams of immutable events. Hence, the strength of Eventhouse lies in fast ingestion and querying rather than transaction processing.
Although this is not a limitation (with the advent of LLMs which can write code), you do need an understanding of KQL which has its own syntax and functions. However, KQL is still similar to SQL in its basic structure so the learning curve is not as steep.

Best suited for

Telemetry and IoT analytics: e.g. analyzing sensor data from thousands of IoT devices in real time to identify anomalies or trends.
Clickstream and user behavior analytics: processing website or app events to derive insights.
Time-series analytics: financial transactions, stock ticker data, etc. where you need to run queries over sliding time windows.

SQL DB

What is it?

SQL Database in Microsoft Fabric runs on the same underlying engine as Azure SQL Database, hence supports T-SQL, relational storage and ACID transactions. Behind the scenes, it’s essentially Azure SQL DB that automatically replicates its data into OneLake in near real-time.

Why use it?

You should use a Fabric SQL Database when you need a transactional or operational data storage. It’s essentially the solution for scenarios that previously required a separate OLTP database. By hosting an OLTP database inside Fabric, you get the best of both worlds - a database for business applications that can handle transactional data and support ACID transactions, and which can also be used for analytics without relying on complex ETL.
It automatically scales to handle load and optimizes performance by adding indexes as needed.

How to use it?

You can connect with any SQL client (SSMS, VS Code, etc.) just like a regular Azure SQL database and supports all T-SQL features since it is built on SQL Server engine.
As data is inserted/updated in the SQL Database, Fabric makes sure that it is replicated to OneLake Delta/Parquet format for analytics. This means OLTP data is quickly available to other Fabric components.

Limitations

As of early 2026, SQL DBs are relatively new addition to the Fabric suite of tools and as such some features available in Azure SQL might not be fully available.
Prioritizes OLTP workloads over analytical queries. While it can run analytical queries and its data can be joined with another warehouse or lakehouse tables, it should generally be reserved for OLTP transactions.

Best suited for

Use Fabric SQL DB when you need to a traditional SQL DB primed for OLTP workloads while also making the data available for analytics without breaking a sweat.

Cosmos DB

What is it?

Cosmos DB in Microsoft Fabric leverages the same engine and infrastructure as Azure Cosmos DB for NoSQL, but is integrated into OneLake. Cosmos DB in Fabric supports storing semi-structured or unstructured data and all characteristics of Azure Cosmos DB. Also, it is “AI-optimized”, meaning it includes features for AI scenarios such as vector search for embeddings, and is designed to serve as a data store for AI-driven applications and analytics solutions.

Why use it?

Cosmos DB is best suited for scenarios where data is hierarchical or is always evolving, because unlike a relational database, one does not need to alter schema for new fields and the data model can adapt.
Cosmos DB is known for its ability to handle large amounts of data and its near instant limitless scaling all the while providing low latency making it ideal for web or mobile based applications.
Ships with vector similarity search (for example, storing vector embeddings of documents or images for search) making it ideal for AI applications.

How to use it?

You can write data using the Cosmos DB SDKs or REST API and can be accessed with SQL API (NoSQL query language). Additionally, the data is available in OneLake (if you enable the OneLake integration, which is on by default) as Delta tables. This allows you to query the same data via T-SQL or Spark if needed. For example, you could create a shortcut to a Cosmos DB container’s mirrored table in a Lakehouse and run Spark SQL on it, or query it from a Warehouse’s SQL endpoint and even join with relational data.
You can also use Azure OpenAI to generate embeddings for documents and store those vectors in a Cosmos DB Fabric container then use vector search capability to find similar items which is especially useful in recommendation systems (think of how Amazon recommends products based on your search and order history).

Limitations

Cosmos DB, fundamentally, is not a relational data store. This means that if you require complex multi-item updates or SQL joins, a relational store such as SQL DB or Warehouse might be more appropriate.
Similar to Fabric SQL DB, Cosmos DB is a relatively new feature and not all APIs are currently available in Fabric as they are in Azure Cosmos DB.

Best suited for

When you are building a new application where data schema is fluid, and you need fast scaling capabilities along with the data being ready for analytical workloads downstream.
When you need to build AI features utilizing vector search capabilities in your solution in Fabric.

Mirrored Snowflake Database

What is it?

Snowflake Database inside Fabric is an integration feature called Mirroring. Mirroring makes a dataset available in OneLake while continuously synchronizing it with data from an external Snowflake data warehouse in near real time. It is only a read-only copy of the Snowflake data which can be used by Fabric tools (Power BI, Spark Notebooks, etc.).

Why use it?

If one has a significant amount of data and investment in Snowflake, mirroring lets you avoid re-building those datasets by making it available to Fabric workloads directly providing a bridge between the two platforms. Moreover, you do not have to build ETL pipeline jobs to pull data, it automatically replicates the changes continuously.

How to use it?

You can add a mirrored Snowflake database using the connection details of your Snowflake account. Once configured, data is immediately available for you to use with Fabric workloads. The database comes with a read only SQL endpoint for querying and the data will keep itself up to date automatically. For example, you could write a single T-SQL query that joins a mirrored Snowflake with a native Fabric Warehouse which can truly unlock the power of this integration between Fabric and Snowflake.

Limitations

Data is read only which means you cannot modify the tables in Snowflake from the Fabric side and any changes to the data must first be made to Snowflake for it flow through to Fabric via mirroring.
Fabric only supports tables that are in Iceberg format and hence certain Snowflake features might not translate through mirroring.
While mirroring aims for real time replication of data, there can be some slight delays depending on the volume of the data.

Best suited for

When you have a large portion of data already present in Snowflake but want to use Power BI and other Fabric workloads for reporting or analytics. Through mirroring the users will be able to interact with Snowflake data with no visible difference.
If you gradually want to move to Fabric and wish to do so in a measured approach and avoid a big lift-and-shift.

Hopefully this article provides you with much needed clarity about the different data stores available in Fabric, how and when to use them while also considering their limitations. Since this was a longer article and you made it all the way through, treat yourself to some ice cream!

Command Palette

Lakehouse

What is it?

Why use it?

How to use it?

Limitations

Best suited for

Warehouse

What is it?

Why use it?

How to use it?

Limitations

Best suited for

Eventhouse

What is it?

Why use it?

How to use it?

Limitations

Best suited for

SQL DB

What is it?

Why use it?

How to use it?

Limitations

Best suited for

Cosmos DB

What is it?

Why use it?

How to use it?

Limitations

Best suited for

Mirrored Snowflake Database

What is it?

Why use it?

How to use it?

Limitations

Best suited for

Comments