Public Data Warehouse

Vision

Build Public Data Warehouse, i.e. world's best and biggest data warehouse to serve general public throughout the world.

Details


Public Data Warehouse (PublicDW)

Public Data Warehouse, a data warehouse built by PublicDW community and used by users across the world. I believe, PublicDW has the potential to become the world's best and biggest data warehouse. Biggest because PublicDW will not be limited by data stored in it by products/subject areas/departments /enterprises/organizations and it will not be limited to use by any specific group but be available for general public across the world. And it is one of the biggest challenges. Best because it will be developed with the best practices, best technologies and best tools and methods. 

Currently, there is no data warehouse that a global community can build and users around the world can use. AFAIK and based on my quick search this will be the very first attempt to achieve something like this. Data warehouses are usually at department/enterprise/organization level or at the max at national level but limited to one or more subject areas.

I know I'm inviting trouble for myself with this project, my biggest DWBI challenge till date, however, I just cannot stop myself from starting this hobby project, which may take rest of my life's spare time and probably of many more DWBI passionate people's spare time in making PublicDW a reality. 

I'm aware that according to some statistics over 70% of the commercial data warehousing projects fail and that there is a high chance that this project can also fail. However, I have now decided that I will not worry about the results but enjoy the journey so I would like to give it a good try and give my best. Even though I have had 100% success in implementing multiple DWBI solutions successfully for various companies in my 10+ years of DWBI experience I was apprehensive about starting this project but now I would like to push boundaries to the maximum possible and challenge myself and learn from the experience. 

It's not only about me pushing my boundaries and challenging myself but also because I see that there is a real need for something like this and I see a huge potential in PublicDW. To begin with, there are so many experienced people who want to improve their current DWBI skills, learn new ETL/BI reporting tools, learn different aspects in DWBI space, challenge themselves with complex DWBI projects than what their current job can offer, show what they are capable of, add achievements in their CV which they can show to their current and future employers, so many people who want to start their career in DWBI, switch to DWBI from their current non DWBI jobs, students who want to have hands-on experience in real DWBI project but there isn't really a good globally available platform for all these people to get a real DWBI project experience.  These I know based on the number of questions (public and private) I get in Quora.com

OK, but how is this going to be built?
I welcome everyone from expert DWBI professionals to beginners/students to DWBI vendors to sponsors join this project and we can build PublicDW together. No matter which of these below roles you currently do all of these roles are required for the project.
  • ETL Developer
  • BI Report Developer
  • Dimensional Modeler
  • BI Architect
  • DWBI Business Analyst / Product Owner
  • Business Intelligence Analyst
  • BI Reports Tester
  • ETL Tester
  • DWBI Project Manager
  • ETL Administrator
  • BI Reporting platform Administrator
  • Server Admin ( Linux/Windows)
  • Network Admin
  • BI Team Lead / Head
  • Data Steward
  • Data Entry Operator
  • Beginner
  • Student
  • Others
To beginners / students 
To start with, I will offer direction and guidance to build PublicDW. I will (hope to) arrange the necessary platform for collaborative development.  I will also offer online training on topics like BI, data warehousing, dimensional modelling, selected ETL Tools and BI reporting tools on pre-agreed weekends or holidays. Beginners/students can contribute by working on real projects and building data marts and reports under my / other experienced DWBI person's guidance. For those who don't know about me, I have hands-on experience on multiple ETL tools like Datastage, BODS and Ab inito and high level idea about Pentaho Data Integration, hands-on experience on multiple BI reporting tools like Business Objects and MicroStrategy and have high level idea about other reporting and visualization tools and hands-on experience on multiple Databases like Oracle, DB2, MySQL and high level idea about other databases. Also I have experience as BI Architect, BI Business Analyst, BI Product Owner and BI Team Lead. I am also the creator of KABI and IBI. So you can be sure that you are in safe hands. I consider myself as a learner for ever so I will also continuously learn in this process.

To DWBI Professionals
You can join me in mentoring and guiding beginners/students and training them. Develop high quality ETL, dimensional models, reports using best practices and set the path for beginners/students to follow. Own one or more data marts and get it implemented. Learn another tool from an experienced person in that tool and teach them a tool you know. Try an ETL flow using multiple ETL tools and see which one is good for the purpose, same with reporting or databases. I am also fully open for your ideas as long as it aligns with the vision.

To DWBI Vendors
You can provide us your support by providing us the necessary software/hardware so that we can work with your tools too.

To potential sponsors
Initially we may require sponsors so that we can use the best tools and best technologies and not restrict ourselves only to open source tools. Later, once the PublicDW is created we could make PublicDW self-financed by turning it into freemium business model. Free for general public and offering paid services to businesses. 

And finally the users
There are huge amounts of data already available publicly, we will start with whatever we think is the best to start. You can continue to let us know which publicly available data you would like us to bring into the data warehouse or which report you would like to be built and we will plan it and do it on higher priority.

High level implementation plan
As PublicDW will be built using bottom-up approach (Kimball way), we will build several data marts and these data marts will be linked using conformed dimensions to form Public Data Warehouse. For each data mart there could be one data mart implementation owner. Data mart implementation owner takes full ownership of getting this data mart implemented. Initially we will start with one set of tools. Once we have access to multiple tools, we could develop the same data mart using another set of tools and compare which tool implements it the best. Same is the case with creation of reports on top of the data marts. We can develop same reports using multiple tools and figure out which reporting tool is best for the purpose. In terms of timelines, develop, test, release when and what you can. All work  in progress items should be saved centrally so that when you can't continue another member can pick it up from where you left it and complete it.

What are the benefits of PublicDW?
By reading above sections you probably can already see various benefits of PublicDW. To list out
  • PublicDW could become the one-stop source for all data warehouse backed reporting requirements for general public users throughout the world. 
  • DWBI professionals can challenge themselves with this project and gain experience.
  • DWBI professionals can become experts in different areas of DWBI. Basically cross training.
  • DWBI professionals can add their contribution on their CV and actually showcase the work they have done to their future employer.
  • DWBI professionals can offer their services (like report creation, training's etc) to businesses.
  • Beginners/students can get training and hands-on experience on a real project that the whole world is aware of. 
  • DWBI vendors can test their tools with this project. E.g. carry out performance testing.
  • Businesses/companies can make their DWBI tool selections by seeing the tools in action.
  • Training institutes can provide training to their students using this platform.
  • If PublicDW starts offering premium services to businesses we could earn too.
  • Many more
What are the challenges in building PublicDW?
There are too many challenges, some of these that are currently known are listed below
  • Getting sponsors
  • Co-ordination, collaborative working and release planning
  • Data Quality
  • Data Governance
  • Maintaining high availability
  • Ensuring sufficient time is spent
  • Ensuring consistency
  • Training for beginners
  • Training for users
  • Making it easy to use for general public
How are we going to start?
First step is to create awareness, that is the intention for publishing this article. I will also share it in other sites like LinkedIn, Quora and Facebook. You could also share it with your connections. Once we have enough members in the group (community) and the platform is ready we can start with the Training and project work. In the meantime we could already start identifying possible data sources. Let's start the journey to build the Public Data Warehouse -World's best and biggest data warehouse.

For updates check the Public Data Warehouse page 

Comments

  1. Thanks to Atlassian for great software like Confluence and JIRA that I could quickly and easily start with the documentation and task management part of the work on Public Data Warehouse Project. Further updates will be directly on the website Publicdw.com

    ReplyDelete
  2. Thanks for the comment. Please refer publicdw.com (https://publicdw.atlassian.net/wiki/spaces/PUB) for project page.

    ReplyDelete
  3. Public Data Warehouse project has been taken over by PublicBI company ( https://www.publicbi.com/public-data-warehouse ). Feel free to join the project if you like.

    ReplyDelete
  4. Replies
    1. Hi Guys,

      If you want to advertise just send me the details and I will add it to the blog.

      Best Regards,
      Anoop

      Delete
  5. Replies
    1. Hi Guys,

      If you want to advertise just send me the details and I will add it to the blog.

      Best Regards,
      Anoop

      Delete
  6. Your blog had very good knowledge and that gave huge instructions and that was really commendable ideas. you have provided good knowledge on this topic please share more information with us.Azure Data Warehouse Solutions

    ReplyDelete

Post a Comment

Thanks for your comment. It will be posted after checks.

Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI