Data Warehouse characteristics

A data warehouse is usually part of business intelligence solution. You can also have a data warehouse for regulatory reporting purposes or legal requirements. Data warehouse is where the data is historized (versioned), centrally stored after cleansing, transforming and unifying of the data from one or more data sources. Data warehouse is designed in a such a way that it makes it easy for reporting and data analysis on large amount of data. Without data warehouse it would be very difficult and time consuming to create consolidated reports based on data from various sources and also to do a time trend report as most of the source system store only current snapshot of the data for operational reasons.

Data warehouse (DWH) in its simplest form is a data repository/store specifically modeled/designed for high performance and efficient reporting and analysis of historic, current and calculated data. Usually a good business intelligence solution is backed by a data warehouse.
  • In a data warehouse, data is historized/versioned. Example - when customer moves from status A to B, transactional systems usually store only the current status whereas DWH stores both the records with time periods (valid from and valid to).
  • High performance (minimum time) during data read operations. Data is denormalized in the access layer of the data warehouse by intentionally introducing controlled redundancy. Usually the data access layer of DWH is modeled as star schema.
  • Data lineage is maintained i.e. data sources can be tracked
  • Data is clean and unified. Ex - one data source may store Male as Male, Female as Female, Not disclosed as Unknown whereas another system may store Male as M, Female as F and Not disclosed as Null. After data is extracted from both these systems before loading into the DWH the data is cleaned using a single convention. Ex - for Not disclosed in both the cases Unknown is used and similarly for Male and Female M and F is used respectively.
  • Data is integrated i.e. data from multiple data sources are consolidated
  • Data is usually not hard deleted unless there are regulatory requirements or performance issues.
  • Data warehouse have auditing enabled on data changes. Almost all of the metadata of the data changes are logged.
  • Has large data storage capacity
  • Usually DWH is loaded using batch jobs and the jobs are asynchronous to data changes in source systems.
  • New data for existing data areas can be added with ease.
  • New data sources can be integrated with ease.
  • Data within the DWH is arranged based on subject areas than based on applications. Ex 2 or more source applications may have sales data, However in a DWH when a sales data mart is built the data from all 2 or more source application is combined and stored based on sales subject area.

Comments

Post a Comment

Thanks for your comment. It will be posted after checks.

Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI