Data Lakehouse

According to LaPlante (2021) a lakehouse combines elements of data lakes and data warehouses to build something new:

  • Lakehouses have similar data structures and data management features to data warehouses, but use the low-cost, flexible storage of data lakes.

Once you combine a data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse (Inmon, Levins & Srivastava, 2021). The article “Evolution to the Data Lakehouse” by Bill Inmon and Mary Levins (2021) provides an overview about the main challenges of current data architectures and how data lakhouse architectures address them:

Lakehouse

After reading this article, you should be able to answer the following questions:

Questions

  • What are the challenges with current data architecture

  • How does the data lakehouse architecture solve the key challenges of current data architectures? Describe the key features.

The book “Building the Data Lakehouse” from Inmon, Levins and Srivastava provides a high level overview about important concepts of the lakehouse architecture:

The following paragraphs are especially relevant to understand the lakehouse concept:

  • The data lake (p19-22)

  • Current data architecture challenges (p22-23)

  • Emergence of data lakehouses (p23-29)

  • Different Types of Data in the Data Lakehouse (p39-50)

  • The Open Environment (p53-70)

  • The Analytical Infrastructure for the Data Lakehouse (p87-104)

  • Data Lakehouse Housekeeping™ (p135-168)

  • Purpose of data governance (p229-243)