Showing posts from 2018

What is data profiling?

I wanted to profile data,  CSV files with 60+ columns and 1 million plus rows. I started searching for a easy to use tool that I could use for data profiling of these files.  That's when I noticed that data profiling is not clearly explained anywhere.  So here is my attempt to cover this important topic, and I will also introduce the tool, that I found out as part of my search, which helped me a lot to get to know the data. What is data profiling?  In simple words data profiling is a process in which we try to understand the characteristics of the data without associating it with a business process. So basically anyone can carry out data profiling for any data. You don't have to know who generates the data, where and how the data is generated, what is the context of that data. What are the answers we are looking for?  Some are listed below to give you an idea How many columns are actually there in the file? Does it match specification/documentation if available?

Don't use front-end where it is not required

Regular manual download of data from a portal (frontend) and loading it into a backend system is like exiting an airport and then reentering the same airport to catch a connecting flight from the same terminal when you have no business outside the airport.  I have seen people doing this, downloading data from portals and then loading into other systems manually. If there is no one looking at the data in the UI and no decision is to be made and only regular data feeds are required then don't do it via frontend (GUI), just get a data extract job created that will automatically load the backend system.  This is partly related to one of the previous posts -  BI can take you to places

Should business users spend their time in creating reports?

A marketing manager, or a HR manager, a sales manager, or an account manager should he (or she) be spending time in creating reports or using reports to make decisions? On one side, total dependence on BI team for all information needs can slow down business users. On the other hand if business users have to create their own reports or work their way through the dashboards or self-service BI to get to the numbers they are looking for, it could kill their time and thereby decreasing the time for their real work, part of which is to take decisions based on information and insights. And that's why there needs to be a balance to ensure basic first level information can be self-served and for complex requirements BI teams spend time in delivering the information. 

Prediction vs Forecast

In the context of data analysis/BI  when is something called as a prediction and when is something called as a forecast? Quite a lot of people use these terms interchangeably. Dictionary also can't help here, see below. Source : So in the context of data analysis/BI I would say forecast is based on past trends.  Time series  is involved. Based on previous behavior future behavior is forecasted for a specific time period.  On the other hand prediction may or may not be based on past trends. So all predictions are not forecasts, but all forecasts are predictions. In this way forecast is like a subset of prediction. Example of a prediction which is a forecast - No of books that will be sold each month in the next 6 months. Example of a prediction which is not a forecast - Country X will win the world cup because they are a good team and in the best form compared to other teams. What do you think?

New Year Resolution - Starting with IBI?

A few people who have attended my presentations this year and became aware of the concept of IBI  (Individual Business Intelligence) and a few people with whom I have been in contact this year have shared their interest to start with IBI starting from 2019 or have already started. It feels nice to get feedback like below.  " I attended your presentation at the publication office. I just read your article on IBI and I love your idea."   -   Message in LinkedIn by a senior professional who attended my presentation of PublicBI BI solution for EU Public Procurement at the Publication Office of the European Union, Luxembourg. This is really great that people have started it or plan to start.  I wish you all the best. For those who are still not sure what IBI is and how to use it, below links should be useful. Basically in this post I am placing all the important IBI links in one post in a sequence so it is easy for people to find and understand.   Short (9 minutes) presen

BI can take you to places

Using BI team only as a data extract team is similar to using a car headlamp to light a room.  You are in a dark room in the ground floor, somebody starts a car outside the room and the light from headlamp of the car enters your room through the glass windows. You can now see some of the things in your room. You now order (you have authority unfortunately) the driver to keep the car on with headlamp on. Driver tries his best to convince you to please get a bulb soon for your room so that he can take the car to go places, but you don't understand, because you have never seen a car and don't know that it can move.  This is how some of the uninformed business users and uninformed non-BI technical people view BI teams. They think BI team has the expertise in moving data so let us use them for moving data. No, BI teams move data to consolidate, to combine, to integrate data, to harmonize, etc., so that users can get full picture of the business based on the information an

Open data is the low hanging fruit within Public data

Public data is all the data that is publicly available for everyone to make use for any purposes they wish to use it. And Open data is the subset of Public data. Open data is well-defined, maintained, generally more reliable, and there is some sense of assurance that there will be continuous availability of data as the data and the related documentations, APIs, access points, portals, etc., are  made available by the generator (source institution) or by authorized data aggregator organization. In this sense, from my point of view open data is the low hanging fruit within Public data. Open data is a subset of public data.  

Tool to auto populate data based on a dimensional model

Are there any tools that can be used to auto populate data into tables based on a dimensional model? If it doesn't exist, may be this is  something some company can build and offer. Various tools including ETL tools provide feature/component that can be used to generate dummy data based on schema definition.  What I am looking for is a tool to which we can provide a dimensional model (Facts and Dimensions), and target DB connection and that the tool is able to auto populate dummy data into the tables (Dimensions first and then Facts) accordingly maintaining all the relationships in tact. Tool should be able to create data for all SCD types and all types of Fact tables.  A tool like this would help in speeding up prototyping, testing, visualizations, etc., so basically would speed the development and delivery. 

Information from data is like bread from wheat

When people are hungry and in a hurry and lack bread-making skill, you can't give them wheat and ask them to make bread and then eat.  You need bakers who know how to transform wheat to bread in a scalable way and keep the bread ready for hungry people. This is exactly how BI professionals are required to transform data to information to derive insights and keep them ready for knowledge hungry users to consume. 

Google Data Studio - Excel on steroids for free

I have seen some people referring to one of the popular data visualization tool as Excel on steroids. Based on that, I think Google Data Studio  is soon going to become the Excel on steroids tool for free . Thanks Google for bringing this amazing product. For those who are new to this, Google Data Studio is the data analysis and visualization platform from Google, and it's free to use.

European Commission awards prizes for the innovative use of public procurement data - European Commission

European Commission awards prizes for the innovative use of public procurement data - European Commission : European Commission awards prizes for the innovative use of public procurement data - European Commission .

KABI Building Blocks and Workflow

Based on a request from one of the readers, the images that captures the building blocks of KABI and the Workflow is attached below. KABI Building Blocks - Click to enlarge KABI Workflow - Click to enlarge

PublicBI BI Solution for EU Public Procurement is now online

After months of unbelievable amount of hard work (no one except for my wife and my 4 year old son knows what I have gone through), feels really great to have taken the Public Procurement BI Solution project to a shape that we could present it at the EU Datathon challenge in Brussels on 2nd October, 2018 (a special day, Gandhi Jayanthi).   And on top of it received an invitation from the European Commission to receive the award from Portuguese Minister of the Presidency and of Administrative Modernization and Ms. Irmfried Schwimann, Deputy Director-General DG GROW, European Commission and present the solution briefly at the Digital Transformation in Public Procurement Conference, Lisbon on 18th October, 2018 in the presence of several distinguished guests, including members of European Parliament.   Happy to announce that the solution is online and can be accessed from this link ( ).  I welcome all feedback on the solution. Our 7 minutes (hard limit)

Data Analysis of Data Scientists Survey Results Report

The Data Analysis of Data Scientists Survey Results Report has been published on  The report is available on this link - . 

EU Datathon 2018 Finalist

Feel very happy and motivated that the PublicBI team has been shortlisted as one of the finalist in the EU Datathon 2018 competition organized by the Publications Office of the European Union. For more details about EU Datathon see the official website .   Congratulations to all the finalists and all the best! This now also means that I have to put some of my planned articles (example GDPR) on hold so that I can work on the Datathon topic.

AWS Innovate 2018 - Interesting and Useful

Today I received my certificate for attending the AWS Innovate 2018 Online Conference on 19th July 2018. I had managed to attend all of the sessions from the Big Data and Analytics track and couple of sessions in the AI and Machine Learning track (see agenda below). All of the sessions that I attended were quite interesting and useful. If you have missed don't worry there is still an option to watch the sessions using the on-demand option for free.  See . If you have attended other tracks and can recommend a must watch session that would be great. I would like to sincerely thank all those who were involved in making this event successful. I can imagine the hard work that would have gone in making this happen based on my experience in organizing the first ever BI Online Conference ( PublicBI BIKON May 2018 ). Looking forward to more such events from AWS team.  AWS Innovate 2018 Agenda Screenshot - Click to enlarge

500 Days of IBI

For all those people who are following my blog, especially on the IBI topic, I am happy to announce that I have now completed 500 days of data capture. For all posts related to IBI see  IBI posts . Just by being disciplined enough to capture the data for 500 days and the fact the I managed to do it and still continue doing it, there is some sense of achievement even though nothing is achieved.  So today I spent some time in data analysis, reflected upon the days that went by using data. Went through the details of the mistakes that I have made, the days I was not happy, etc., to understand if there is a recurring pattern and to tell myself that I shouldn't repeat those. I also made some changes to the data capture template and included new charts too.  I have now added a "Year" column in the data capture template because now I have data for more than a year.  With this addition there is possibility to compare same months across years and also to compare year on y

GDPR Compliant Business Intelligence Solution - Part 1

Introduction W ith GDPR coming into action on 25th May 2018 some adjustments are required for most of the systems that run in an organization to ensure technical compliance along with several other organizational measures - see  Measures for GDPR compliance for a company , and obviously it's very much applicable for data intensive systems like BI. In this post I will limit the scope to BI solution. If you are new to BI please take few minutes to glance through this page  Business Intelligence  to get a rough idea about BI.  N ow on one side there is no point in building a GDPR compliant BI solution in such a way that it can't be used for business improvement purposes, or for decision-making at all levels in a company, because BI users will eventually stop using it and the system will become obsolete.  On the other hand there is huge risk (both financial and loss of reputation) in building or maintaining a BI solution that's fully in-use but not compliant with GDPR. So

PDRHS - Personal Data Request Handling System for GDPR Compliance - Part 2

Continued from part 1 , now we go more deeper about handling the requests from data subjects. This is explained using the simple flow chart provided below. Click to enlarge B ased on the above flow chart we can now easily list various processes that needs to be carried out by PDRHS. Note that the processes mentioned here are specific to PDRHS with underlying assumption that all other systems are already GDPR compliant. Facilitate reception of various types of requests from data subjects. Store the request. Classify the request as fake or genuine. Verify the identity of the data subject. Collect additional personal details if necessary to verify identity.  Categorize based on type of request. Check the frequency between the requests. Estimate the charge/fee to be applied for too frequent requests. Set the level based on which too frequent requests may be rejected.  Find and consolidate data about data subject. Collect information about automated decision making process

PDRHS - Personal Data Request Handling System for GDPR Compliance - Part 1

In one of my previous posts ( Measures for a Company for GDPR compliance ) on GDPR topic I listed the measures that a company has to take to ensure GDPR compliance. One of the measures is to implement PDRHS (Personal Data Request Handling System).  In this post I will go into more details about PDRHS. Again, I will limit the scope to companies (exclude public bodies and others).  PDRHS is an abstract of a system that facilitates data subjects to exercise their rights related to personal data. PDRHS is expected to manage the life cycle of data subjects' requests related to personal data. PDRHS in terms of complexity could be anywhere between very simple to very complex, and in terms of automation could be anywhere between fully manual to fully automated solution depending upon the type and size of the company and number of data subject requests the company receives. Companies like Facebook and LinkedIn already provide means to exercise some of our personal data rights in

Who are the data subjects in the context of a company?

Below diagram provides typical data subjects in the context of a company. Click to enlarge Users are represented separately from customers to differentiate the group that uses a service, software, etc., but does not pay for it. For example Facebook users, LinkedIn standard account users Duolingo users, etc. We as users don't pay to these companies so we are ideally not customers. For all posts related to GDPR see GDPR For rights related to data subjects see How do I as a data subject benefit from GDPR? Disclaimer : I am not a legal expert nor a certified GDPR consultant (not sure if there is one certification yet). I am a data enthusiast (and now GDPR enthusiast) and I like to envisage, conceptualize and design solutions for real problems. All posts related to GDPR are only to present my understanding and to start a good discussion with the audience. As every business is different please consult legal experts to understand obligations specific to your company. For offi

How can companies benefit from GDPR?

If you thought that GDPR is all about providing more rights to data subjects and that it's about making it difficult for companies to run businesses then you have not understood GDPR fully. Whether you consider GDPR as an opportunity or as a threat depends on the level of preparedness and your positioning. From my point of view GDPR creates a level playing field for all and thereby helps smaller companies or new entrants. How can companies benefit from GDPR? Some random and disconnected examples of benefits for companies are provided below Companies that plan to build next Facebook or LinkedIn have better chances of getting existing user data from those platforms into their platforms. How? By ensuring that their platform can consume data from Facebook or LinkedIn easily, and informing people about their right to data portability and giving them incentives to exercise that right.  Companies can now compete in the market not just based on pricing but also based on &qu

How do I as a data subject benefit from GDPR?

How do I as a data subject benefit from GDPR? GDPR provides several rights to data subjects Right to personal data protection Right to know or obtain confirmation of personal data processed Right to access personal data processed Right to receive personal data Right to data rectification Right to erasure (Right to be forgotten) Right to object to fully automated decision-making / Right to obtain human intervention Right to object further processing Right to data portability / Right to transmit data from one controller to another Right to lodge a complaint with a supervisory authority Right to effective judicial remedy We now have the rights to know Who / Which companies are processing our personal data? What purposes are they processing it for? To whom / which company is our personal data shared? Are there any automated decision-making involved? Where possible, to know the details of the logic involved in the automated decision-making that concerns us.  How lon

What is GDPR?

What is GDPR? GDPR stands for General Data Protection Regulation ( Regulation (EU) 2016/679 )  that came into force on 24 May 2016 and is applicable from 25 May 2018 onwards. For official documentation please check the official website of EU - From my point of view EU (European Union) has taken a visionary, bold and an exemplary step in creating a regulation that is so comprehensive for personal data protection of natural persons and free movement of data within the EU. GDPR provides more rights to all of us natural persons (data subjects). How do I as a data subject benefit from GDPR? See  How do I as a data subject benefit from GDPR? How can companies benefit from GDPR? See How can companies benefit from GDPR For all posts related to GDPR see  GDPR Disclaimer  : I am not a legal expert nor a certified GDPR consultant (not sure if there is one certification yet). I am a data enthusiast (and now GDP

Measures for GDPR Compliance for a Company

What measures should a company take to be GDPR compliant? GDPR is not only applicable for companies but also for organizations like public administration. In this diagram given below I present my understanding of what measures a company should take to be GDPR compliant. Click to enlarge I guess all of the measures, except for the Personal Data Request Handling System (PDRHS), mentioned in the above diagram should be easily understood. I will explain PDRHS in my future posts in this blog. I also plan to cover how companies can leverage existing DWBI tools and expertise of DWBI professionals within the company to implement some of the technical measures. GDPR Compliance Bottom-up approach Click to enlarge Related posts : What is GDPR? For all posts related to GDPR see - GDPR Disclaimer: I am not a legal expert nor a certified GDPR consultant (not sure if there is one certification yet). I am a data enthusiast (and now GDPR enthusiast) and I like to envisage, co

Master's in data science or MBA?

Question from one of my connections in LinkedIn who has over 8 years of experience in BI in various roles. Question I am having a tough time these days in deciding about my future. Want to move ahead into leadership role for a data science team. Want to know if I should do a masters degree course in Data science or an MBA. Which will pave a better path for future 10-15 years of professional career. Can u please recommend something on this.??? My answer My current view based on current market situation is as given below I would definitely go with Master's degree in Data Science among the two options. If there is a possibility to get to leadership roles in data science team without doing both I would also recommend that because you already have good work experience unlike freshers. i.e. instead of going for a full-time master's course, choose to learn data science and implement own projects or contribute to data science teams at work and then eventually move into data s

Data Analysis of Data Scientists

To clear confusion, to help everyone, especially aspiring data scientists, PublicBI has come up with a survey called Data Analysis of Data Scientists. Request all data scientists to take around 5 minutes of their time and participate in the survey and provide valuable inputs. The summarized results will be published in . Results will be free to use for all. If you are a Data Scientist please take part in it. If you are not a Data Scientist please share the survey link with Data Scientists you know. Survey link is here -  Data Analysis of Data Scientists

Is it easy to learn another ETL software once you know how to use one ETL?

This was one of the questions by a person via LinkedIn.  He wants to know if it's easy to learn another ETL software once you know how to use one ETL tool? Also if you know one of the Reporting and Analytics tool like SAP BO is it easy to learn another one like MicroStrategy? My answer : In general, yes, it's easier to learn the second tool in the same category. From own experience, it was quite easy to learn Ab initio after having worked with DataStage, and it was again quite easy to learn BODS after Ab initio and DataStage.  After working with these 3 tools it was quite easy to understand what Pentaho Data Integration (PDI) offers when I led a team which worked with PDI. Similarly, it was quite easy to pick up MicroStrategy after working with SAP BO.  And similarly other tools like Tableau, Excel, Google sheets, GDS, etc.

BI Analyst in banking to retail or pharmaceutical

Question : Is it possible to work in banking as a business intelligence analyst and then after some years change to work in the retail industry or pharmaceutical industry? This was one of the questions by a LinkedIn connection.  He wants to know if it's possible to switch domains (for example from banking to retail or pharma) while performing business intelligence analyst role. My answer : In general, yes, it's possible. The skills you need are the same. Your experience will definitely support you. You will have to pick up the domain knowledge quickly. Domain knowledge is the most important aspect in this role from my point of view.     

KABI now in Spanish (El Salvador)

Happy to announce that, among others, one of the consulting firms in El Salvador got interested in KABI methodology and have started publishing and promoting KABI in Spanish. See For more about KABI see - If you need 2 hours KABI training (online or onsite) or KABI coaching (a few days onsite) please email or use the PublicBI contact us page. 

What is Business Intelligence?

As part of the welcome note during the recently organized PublicBI BIKON event I have clarified what exactly is Business Intelligence and also cleared the confusion around the definition of BI.  Watch this short video to get an understanding.

PublicBI EBIT - Essentials of Business Intelligence - Training

As part of PublicBI I will be delivering a crash course on BI called PublicBI EBIT (Essentials of Business Intelligence - Training). See agenda slides below. For more details see PublicBI EBIT . 

PublicBI BIKON Videos now on YouTube

All of the videos (except for Q&A part) of PublicBI BIKON event are now available on YouTube on PublicBI's YouTube channel. Welcome note and introduction to BI. A revolutionary IoT reputation project becomes economically viable through pattern based Real-Time Big Data Technology Significance of ETL in this era of Data Science, Analytics and Cloud Technologies Data to Insights - A Personal Journey by Rajesh Patel, Active Intelligence Effective Testing Strategies and Tools for DWH/BI Projects by Wayne Yaddow Business Intelligence Roles, Skills and Tools in Demand, and Tips for BI Jobs in Germany Humanizing data science by Jeff Baird Introducing Samni Systems

PublicBI BIKON (International Business Intelligence Conference) on 3rd May, 2018 - Agenda

7 speakers from 7 cities in 5 time zones and 3 continents. 8 talks * 2 rounds. 4 "Ask any BI questions" breaktime sessions. 25+ international cities from where people have registered as participants. Registrations from Melbourne to Los Angeles.  30+ job titles. Including Developers, Consultants, and CXOs.  20 hours for the start of first ever PublicBI BIKON (International Business Intelligence Conference) event.  PublicBI BIKON is BI conference that covers all time zones in a single day.  Think of PublicBI BIKON as the Car Show for Business Intelligence! We have got some test drive offers too! And the most important part is, It's FREE! No travel, and No payment required!  Sounds interesting? Register at . Participant link will be sent to the registered participants shortly.

Individual Business Intelligence - Free Public Webinar today at 3 PM CEST

Germany, India, Tunisia, UK, Austria, and Singapore are some of the countries from where people have registered for the Individual Business Intelligence free public webinar scheduled for today at 3 PM CEST (Munich Time). Thank you for the registrations. I have sent invites to each one of you. Couple of emails have bounced back because of typos in the email id provided. You may join the webinar on my conference wall at . I guess this will be my last voluntary public webinar on IBI, so for those who missed the chance to register are also welcome and may join via the above provided link. Looking forward to e-meeting you all.

PublicBI BIKON - Test drive and offers

BI/Data professionals, here is your chance to get some exciting offers from the sponsors of PublicBI BIKON.  Register for the event at  .

International Business Intelligence Conference on 3rd May - PublicBI BIKON - Now open for registrations - Free

PublicBI UG is organizing PublicBI BIKON, the 1st ever online International Business Intelligence Conference. This is your chance to listen to the BI experts on interesting BI and related topics. Get to know some of the interesting BI players in the market. You also get to ask any BI related questions during the break time to entire audience apart from the session related questions that you may ask the speakers during the sessions. Also your chance to get some of the promotional offers. For example take some of the tools for a test drive for longer duration or get 1000$ worth training from AIS. Open now for registrations. PublicBI BIKON May 3rd event - PublicBI BIKON page -

Companies, political parties and others are misusing your data. Have you thought about using your own data?

Most of us, especially those in BI and related professions always knew that companies are using our data. And now it's all over the news about how companies, even political parties are misusing our data. What does this mean? It's very evident that your personal data has lot of value and that's why companies and others are trying their best to get their hands on it, sometimes even without your active consent. On one side, they are able to see value in your data. They are using your personal data for their benefit. And on the other side, most of us as individuals are not even able to see the value in our own data, our individual data. We think data is an asset only for companies and not for individuals, and so we don't even attempt to capture our own data, mostly because of lack of awareness about the possibilities and lack of know-how of data capturing and data analysis.  To create awareness about the possibilities, I will be conducting a free public webinar about I

IndividualBI New Templates and Video

IBI Templates The IndividualBI or IBI Template has been updated recently, I have now included 3 new sheets in the template, the 3 new sheets are "Pivot Table", "Charts" and "Max and Min" sheets. For those who are not familiar with analyzing data, creating pivot tables, creating charts etc these sheets should help to get you started with analyzing and visualizing your data. Also, additionally I have now made two versions of the template, Google Sheet version and MS Excel version. I hope this helps a lot of people.  Google Sheet version can be downloaded from here : Individual BI Google Sheet Version Excel version can be downloaded from here :  IndividualBI Excel Version  I have tried to ensure that there are no errors in the formula, calculations etc, however as no one else has reviewed it there are chances that there are some mistakes/errors. If you find any errors please bring it to my notice and I will try to correct

Popular posts from this blog

ETL developer vs Data engineer

KABI - The new Agile Methodology for BI Projects - Implement BI projects quicker happily

BI Architect course and BI Tool question