Businesses should classify data not IT

Currently what we see is that tools and technology limitations are used as a basis for classifying data, and even worse is that the classification is in itself incorrect. The so-called big data is so wrongly named. I have already explained why the naming is incorrect in another article - Why big data is actually small? 

Wouldn't it be much simpler, better, more meaningful and a standardized approach to classify data into primary and secondary data instead of using misnamed, meaningless and non-standard terms such as small data and big data? 

Primary data is the essential or core data without which the business cannot function. For example, a purchase transaction in a retail store has to be captured and stored for billing, payment, compliance,  etc. These are mandatory requirements. Business Intelligence (BI) on top of that data is not mandatory but very useful, but that is not the main purpose of storing that core data in the first place. Secondary data is all of that data which even if it is not captured the business can continue to function. For example, Twitter data of a customer, this can of course provide value in terms of having the ability to use BI to analyze a customer to be able to cross-sell or up-sell, but without this data the business can still continue to function. Secondary data can be considered as auxiliary data. There are no restrictions or limitations on size, velocity or variety on both primary and secondary.
Primary and secondary data
Using this classification of primary and secondary, it makes it quite simple for every business to group sets of data into primary and secondary data based on the understanding of how essential a set of data is for that particular business to operate. Note that because of the difference in the business model between businesses it can very well happen that what is primary data for one business could be secondary data for another business even when the two businesses belong to the same sector. An example of classification of data sets into primary and secondary data for a retail chain is given below.

Primary and secondary data
For some businesses that are paid based on the number of website visits, capturing the number of website visits is of utmost importance, so the web stats would become primary data for that business.
In this way businesses get to classify what is primary data and what is secondary and the IT department doesn't have to force IT jargon such as big data on business stakeholders. 

How does this help business and business leaders?
It will help business leaders to prioritize data-based initiatives. For example for BI/Analytics initiatives first the business can focus on deriving value from primary data. And then focus on secondary data. 

Note that while there is additional cost in acquiring secondary data the cost for primary data is already calculated as part of the business operation. 


  1. You are giving a particularly useful article here. You have depicted all that which is effectively reasonable to everybody. Continue to share this sort of articles, Thank you.Azure Cloud Data Engineer


Post a Comment

Thanks for your comment. It will be posted after checks.

Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI