Posts

Showing posts from 2020

Big data is a big distraction

Image
In my view, Big data is a big distraction, especially when full value has not yet been extracted from the set of data that is not big data. Big data requires big investments, big teams, big timelines, but whether big value will be derived from so-called big data is still a question mark. I don't disagree that dealing with so-called big data is a big challenge, it may motivate, make it look fancy and also challenge some technical people, however, question remains, does it bring big return on investments for the business? Happy to be proven wrong. Open to learn. I would like to get some real examples of big data investments - please don't point to abstract articles in some random blogs which doesn't have the details, the details I am looking for are: 1) Why did the project term it as a big data investment, what makes it big data? What exactly makes it different from a BI or Analytics investment?  2) How much was spent? Total cost that includes people or talent cost and all th

Businesses should classify data not IT

Image
Currently what we see is that tools and technology limitations are used as a basis for classifying data, and even worse is that the classification is in itself incorrect. The so-called big data is so wrongly named. I have already explained why the naming is incorrect in another article - Why big data is actually small?   Wouldn't it be much simpler, better, more meaningful and a standardized approach to classify data into primary and secondary data instead of using misnamed, meaningless and non-standard terms such as small data and big data?  Primary data is the essential or core data without which the business cannot function. For example, a purchase transaction in a retail store has to be captured and stored for billing, payment, compliance,  etc. These are mandatory requirements. Business Intelligence (BI) on top of that data is not mandatory but very useful, but that is not the main purpose of storing that core data in the first place. Secondary data is all of that data which e

ETL developer vs Data engineer

Image
Rephrased question : ETL developer vs Data engineer Answer : Unfortunately there are no strict industry standards on these job titles. That is just one part of it. Before ETL tools such as DataStage, Informatica, Ab Initio, etc., became popular, developers were hand coding every ETL flow. These ETL tools shortened the ETL flow development time to a great extent and allowed ETL developers to focus on business rule/logic/requirement (what to implement) than how to code it or optimize the code. There are many other benefits of using a tool but I won’t go into that in this answer. So an ETL developer with experience in these tools without any programming (coding) experience was/is able to design and develop end to end data flows. Whenever new types of source/target data format comes up, these tools catch up but it takes time, i.e., the ETL tool provider (e.g. Microsoft, IBM, etc.,) adds new components/connectors within the ETL tool to be able to work with new data format. For example, let’

Future data in data warehouse?

Image
Rephrased question : Does data warehouse store future data (meaning forecasts or predictions) ? Answer : Data warehouses stores whatever the company/business has decided to store. There is nothing that stops anyone from storing forecasts. So if forecasts are created and if these needs to be stored then yes it is stored. For example, for a Gas group in the UK we did a project in which we had to store the forecasts for the next 25 years and this was an yearly exercise, which means every year the forecasts for the next 25 years was stored and versioned, so we would be able to fetch the differences between the forecasts too. 

Importance of data privacy compared to data governance?

Image
Rephrased question : What is the level of importance of data privacy compared to the level of importance of data governance? Answer : Data privacy is one of the aspects within data governance. In data governance, simply put, on one side there is a need to ensure data is secure, protected and that it doesn’t fall into wrong hands, on the other side there is a need to ensure that value is derived out of data and data is monetized. Data governance should come up with policies, framework, principles, etc., that satisfies/balances both sides.

Data Analyst or Business Analyst experience to become a Data Engineer?

Image
Rephrased question : Which job would lead to a data engineer job? Is it data analyst or business analyst? Answer : Sorry, there are two things wrong in the question from my point of view There is a wrong assumption in the question that to become a data engineer there is a need to first get into some other job. A business analyst or a data analyst is a job of its own and is not an entry-level job to lead to data engineer job. From a BA the natural progression is that you become a senior BA, and similarly from a data analyst you become a senior data analyst and so on.  Now that there is some clarity, from any of these roles if the person is interested, is ready to learn/unlearn and there is an opportunity in the organization the person can switch between these roles. As data analyst is in general more technical than a business analyst (in the sense of requirements analyst) it would be easier for a data analyst to switch to data engineer role compared to a business analyst. However, the d

Best affordable business intelligence software in 2020?

Image
Best affordable business intelligence software in 2020? This was one of the questions asked.  Answer: You are actually referring to BI software in the narrow sense, i.e., limited to frontend tools such as Microstrategy, Tableau, Power BI, etc., instead of referring to BI in the correct broader sense as an umbrella term for the end to end process of deriving information and insights from data. In any case the answer depends on the specific requirements and current situation (current set of tools, software, strategy, etc.) of the business. In my view it is not right to state that one tool is best for all purposes and situations. We should select the tools based on the needs and the situations. For example, if a company has invested a lot in Oracle products and if Oracle provides additional software for BI free then why wouldn't you want to consider that as an option? Same with Microsoft, let's say some department already has purchased license for MS SQL, SSIS, SSAS, SSRS, MS Exce

Data Science job without experience

Image
How to get a data science job without experience? This was one of the questions asked. Simple answer : One of the best ways in my view is to start as a junior/trainee/intern in a team and learn on the job with guidance from experienced people.

RAG Viz added to covid-stats.de

Image
The RAG (Red, Amber, Green) visualization (image 1) is now also added to the  covid-stats.de dashboard (page 2). It provides some more interesting information. It provides a quick overview of cities/towns that are coming out of COVID-19 and those that are nearing the end of COVID-19 situation within Germany. By clicking and selecting the bundesland it filters out (image 2) to display the cities/towns of that particular bundesland. Using the optional metrics (icon on the top right of the map) one can also get information about last 5 days total cases (image 3), last 5 days average % change, etc. 

RAG Viz added to AnalyzeCOVID.com

Image
The RAG (Red, Amber, Green) visualization has been now added to the analyzecovid.com dashboard. Goal is to get a quick overview of countries that are coming out of COVID-19 and those that are nearing the end of COVID-19 situation.

BI Architect course and BI Tool question

Received two questions, 1 through LinkedIn and other through blogger. Sharing details of Q&A here for the benefit of wider audience. Question 1:  I want to start with a career in BI, and I am planning to get trained in it. The institute covers MSSQL, Power BI, MSBI, Tableau desktop 10, Informatica developer and admin, Azure data factory, SQL DBA, QLIK sense, Data warehousing. Is this all that is needed? Reply : Regarding getting into BI, the best way is to get into a team where you can learn from experienced people in the team. Regarding courses, just based on the list, that's quite a lot for a training, but it depends on what exactly is provided as part of the training. Need more details about the training. Then once the link was provided, I had a check and provided below reply. Note - Link or institute name is not shared here intentionally to avoid any confusion that I am promoting any institutes.  Reply :   I have had a look into the link, as a course for a b

Datapathy - Feelings of data?

Image
Usually when it is technical content I post it here in akvkbi.com and when it is emotions I post it in  my other blog  akclarity.com   to keep facts separate from emotions. But, this time it is a combination of two; it's data and an imagination of what data would feel about BI (Business Intelligence), so posting it in both the blogs. For those who understand data and BI relationship this should resonate. I hope to cheer up all of the data and BI professionals and bring some positivity in these tough times.   Note : Pdf version has been made available so that you can download easily, share it, take a print out, etc. 

COVID-stats.de

Image
In collaboration with Dr.med. Don-Felix Ryzek , who is on the COVID floor treating patients, an interactive dashboard ( https://www.covid-stats.de ) has been built to help dig deeper into the details about Germany's current situation with COVID-19.   For quick reference Fälle = Cases , Todesfälle = Deaths, genesen = Recovered, Verdopplungszeit in Tagen = Doubling rate in days, Gesamt = Total, M= Men, W=Women, U= Unknown

AnalyzeCOVID.com

Image
The Corona Virus dashboard has been upgraded and embedded into AnalyzeCOVID.com website for better readability. So you no longer have to store/remember the long URL of Google Data Studio dashboard instead can easily remember the website address of AnalyzeCOVID.com . This site is mainly for those users who are interested to dig deeper and understand the numbers better and search for trends, patterns and anomalies. 

A $100 Million Company without BI?

Image
Do you know of a company in this generation, from any part of the world, that has annual revenue of 100+ million US dollars and growing and doesn't use business intelligence / business analytics / data science at all? If yes, please do let me know. I have a few questions that I would like to ask as part of a research. To be very clear, I am looking for an active company that fulfills below criteria : Yearly revenue of 100+ million US dollars Doesn't have BI / Analytics / Data science teams Doesn't have data warehouses, data lakes, data marts, etc Doesn't use any SaaS or other cloud variants of BI / Analytics solutions Doesn't use any app analytics, web analytics, etc, basically doesn't use data for analytical purposes at all. 

Publication Office of the EU shares my Corona Virus Analytics Dashboard

Image
Thanks to the Publication Office of the European Union for sharing my Corona Virus (COVID-19) Analytics Dashboard  on their website ( https://op.europa.eu/en/web/eudatathon/covid-19 ). In these difficult times all that some of us can do is to create as much awareness as possible and provide good quality information so that people understand the seriousness of the issue we have.  In this way we try our best to reduce the number of people getting infected and thereby saving a few lives at the least. 

Corana Virus Analytics with prediction

In continuation to my previous post on Analytics using Corona Virus data, I managed to come up with a chart that shows prediction for next 30 days and will be auto updated based on new trends. The predicted numbers look very bad, I wasn't sure if I should share it or not, decided to share it with a thought that it is better to share it, create awareness and prove the prediction wrong than not share it and then later find out that the prediction was right. Now in total there are 5 pages (a pdf version is attached below) in the dashboard. For the interactive dashboard continue using the same link as in the previous post.

Corona Virus Analytics

This is an interactive dashboard that can be used to find trends, patterns and eventually insights related to COVID-19 (Corona Virus) from a reliable data source. Additional notes provided below the dashboard. Direct link to the Google Data Studio Dashboard is here - https://datastudio.google.com/u/0/reporting/2e4f9914-8905-4cc9-b36a-75dc3f220438/page/7JbY Best viewed on desktop/laptop Notes : This is an interactive dashboard that can be used to find trends, patterns and eventually insights related to COVID-19 (Corona Virus) You can use the filters on the top or click on the charts to filter. Feel free to embed the dashboard in your websites (of course for non commercial use). Currently dashboard doesn’t have real-time information. There is a delay of about 24 hours as the data is mainly sourced from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data ( 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE). For some o

Data Science questions

Image
I would like to know the answers for these below listed questions. I would be happy to get answers from as many organizations as possible to understand how it works in various organizations. It would be good to share it here in comments so that everyone can benefit. Who ensures that the data scientist has chosen all the required data fields (features) that are required? For example, for a customer churn prediction, let's assume that the data scientist didn't consider one of the data sources (call center data), who is going to ensure that data scientist considers call center data too? If the data (for example in this case call center data) is not available in the data lake/data warehouse or any of the other databases other than OLTP DBs will the data scientist fetch the data from OLTP systems or will they just raise request to the data engineering team to bring the data to data lake/DWH? Who actually tests the code written by a data scientist? Who verifies that

Big data developer or BI developer

Question : Which one adds more value and has more pay?  Answer : Firstly, big data is a buzzword. There is data, and there are processes to derive information and insights from data. Business Intelligence is the concept/process of deriving information and insight from data to enable fact-based decision making to improve a business. So if you see it from that context, BI is an umbrella term and therefore all activities that are to do with deriving information and insights from data falls under BI. However, currently because of the hype almost every organization wants to showcase that they are also making use of big data, truth is, some of these companies have not even made use of basics (“easily available data”). I know at least two companies in Europe (Germany) where people were hired as big data developer but there was nothing for them to do there as there was no big data tech stack yet. So the answer to your question depends on which organization and what stage of data journ

Tools recommendation for an aspiring BI Analyst

Question : As an aspiring BI analyst which tools should I learn? Answer :  Please note that there are differences between a BI Analyst , BI BA , Data Analyst and Data Scientist. I will keep that for another post if you are interested. In general for a BI Analyst it is not mandatory to have programming skills or SQL skills. So based on this, here is the list as of today. SQL (must have) Python (good to have) Experience with at least one of the below listed tools is a "must have". Good to have experience in multiple tools. Power BI Tableau Microstrategy Desktop and Enterprise MS Excel (advanced) Google Sheets Google Data Studio Qlik Sense IBM Cognos Analytics Apart from these it is also important to have good communication and presentation skills. 

Can I switch from Finance domain to BI?

Question from a BI aspirant " Can I switch from Finance domain to BI? I have been working in Finance for 3 years, do I have to do masters in Business Analytics or Business Intelligence to get into BI? " Answer : Doing a masters in Business Analytics or Business Intelligence is one of the ways to get into BI and therefore is not mandatory. Please note that BI has various roles, for example ; BI Developer (Frontend/Reporting and Analytics platform, Backend/ETL developer), BI Analyst, BI BA, etc. So the path you take depends on the role you are interested in.

3 years of IBI

This week (on 19th Feb 2020) I completed 3 years of IBI. To celebrate it I have created a Dashboard in Google Data Studio. So from now on instead of using the charts in the Google Sheets I will use the Dashboard. Google Data Studio is more user friendly and mobile friendly too, also it comes with easy to use filters and drill down/up capabilities. PDF version of the first 2 pages of the shareable dashboard is attached below. I am happy to share the first 2 pages of the IBI Dashboard, reach out to me if you would like to get the Google Data Studio IBI dashboard. Once you get the dashboard (without my data of course) you can make a copy of it and point it to your IBI data (assuming you have started), add more charts, pages, etc., and use it for your benefit. Like with every year, this IBI year too I have learnt a lot about myself, made new mistakes, changed several things about me. The benefits of IBI is much more than we can think of. Use it to know it. And what are some of the

How many copies of a payment transaction record?

Image
I am curious to know, approximately how many copies of a payment transaction record are stored across all systems across all parties involved in the payments sector?  Based on a simple and rough calculation I think it is a minimum of 60 copies of each payment transaction. Of course every party doesn't have access to every field (column or data point) and has access only to that part of the record which it is authorized to have access.  So, it looks like we (all parties combined) use at least 60 times more storage than it is required to store a payment transaction record. I'm open to learn from others if I have missed some parties/systems/data layers. And is there a possibility to centralize storage across all parties to store it once and to provide access to parts of the records to parties/systems instead of duplication? May be a point to think. 

At least 3 people have started IBI this year

Image
Very happy to know that at least 3 people, whom I know (and have informed me), have started with IBI  (Individual Business Intelligence) from 1st Jan 2020. One is a close friend and ex-Wirecard who has started with 20+ data points, another is a colleague who would like to capture data and do predictive analytics using his data and check if his predictions comes true, and the 3rd one is my 6 year old son with his mama's help who has started capturing various data points including how many times he cried in a day, how many times he fought, how much time he spent learning something, etc. All the best wishes to 3 of them. I am sure more people will realize the benefits of IBI.  To create awareness about IBI I plan to introduce IBI in different languages that I speak. I have started the series with one introductory video in Kannada language. Please find it below, and please share it with friends and family who speak Kannada. Goal is to create as much awareness, including those, who

Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI