Data Wrangling

In the context of business intelligence, data wrangling is converting raw data into a form useful for aggregation/consolidation during data analysis. Before data is analyzed/visualized we need to ensure that we have unified the data.

Simple example, if you want to visualize number of customers by city, then you need to ensure that there is only one row per city before data visualization. If you have two rows like Muenchen and Munich representing the same city this could lead to wrong results. One of the rows has to be changed manually by the data analyst/user and this is done by creating a mapping on the fly in the visualization tool and applied to every row of data to detect for more such issues and the process is repeated for other cities.

In a BI solution backed by a data warehouse all of these data transformation, cleaning, mapping, etc., is dealt by the ETL/ELT before data is presented to the user and hence the end user doesn’t have to bother about these data preparatory steps as he already gets unified, cleaned and transformed data ready for analysis. But if there is no data warehouse and the data visualization tool is directly accessing transactional data then the end users have to deal with data preparation and one of the step in data preparation is data wrangling.


Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI