Data Warehouse ETL

Data Warehouse Architecture: A Guide

Data Warehouse Architecture: A Guide

Introduction

A data warehouse (DW or DWH) is a sophisticated system that stores historical and cumulative data used for forecasting, reporting, and analysis. It entails gathering, cleansing, and transforming data from various streams before loading it into fact/dimensional tables.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile data structure.

The DWH integrates data from multiple sources, providing the user with a single source of information in a consistent format while focusing on the subject rather than the operations. Because it is non-volatile, all data changes are recorded as new entries without erasing the previous state.

The architecture of a Data Warehouse

A data warehouse system can be built in three ways. The number of tiers in the architecture is used to categorize these approaches.

  • Single-tier architecture
  • Two-tier architecture
  • Three-tier architecture

Single-tier architecture: A single-layer structure designed to conserve data space. In practice, this structure is rarely employed.

Two-tier architecture: A data warehouse is a collection of data that is easily transformed and loaded into a database. Data warehouses may be deployed in a variety of methods, and it is critical to select the best one for your company's needs. The most crucial element to consider is scalability. A data warehouse is a good option if you need to store a huge amount of data in a little amount of space.

Three-Tier Architecture

  • A relational database system is the Datawarehouse's lowest layer. A relational database system is often included in this database system.
  • ROLAP or MOLAP is used to power a middle-tier OLAP server. By acting as a middle-tier OLAP server, it isolates OLAP from the end user. Middle-tier OLAP servers are data warehouses that enable end-user interaction with the database and middle-tier OLAP servers isolate OLAP from the end user.
  • The top-front-end tier's client layer is critical since it is the initial point of contact with the data. It is the location where data is given to the end user and decisions are made based on the data.

Properties

Summary

  • A data warehouse is a type of information system that stores historical and commutative data from one or more sources. These sources might include traditional data warehouses, cloud data warehouses, and virtual data warehouses.
  • A data warehouse is subject-oriented since it provides information on the subject rather than the organization's ongoing activities.
  • Integration in a Data Warehouse refers to the development of a single unit of measurement for all related data from many databases.
  • Non-volatile data warehouses do not destroy old data when new data is placed into them.
  • A data warehouse is time-variant because the data in it has a long shelf life.
  • Data Warehouse Architecture consists mostly of five components: 1) A database. 2) ETL Software 3) Metadata 4) Search Tools 5)DataMarts
  • These are the four major types of query tools. 1. Tools for query and reporting 2. Development tools for applications, 3. Data mining tools 4. OLAP software
  • All conversions and summarizations are carried out using a data source, transformation, and migration technologies.
  • Meta-data is vital in Data Warehouse Architecture because it identifies the source, usage, values, and attributes of data warehouse data.