Data Science questions

Data Science questions that needs some explanation

I would like to know the answers for these below listed questions. I would be happy to get answers from as many organizations as possible to understand how it works in various organizations. It would be good to share it here in comments so that everyone can benefit.

Who ensures that the data scientist has chosen all the required data fields (features) that are required? For example, for a customer churn prediction, let's assume that the data scientist didn't consider one of the data sources (call center data), who is going to ensure that data scientist considers call center data too?

If the data (for example in this case call center data) is not available in the data lake/data warehouse or any of the other databases other than OLTP DBs will the data scientist fetch the data from OLTP systems or will they just raise request to the data engineering team to bring the data to data lake/DWH?

Who actually tests the code written by a data scientist?

Who verifies that the training data selected by a data scientist is of good quality and that it reflects real situations? If the data is not of good quality and doesn't cover the real situation then the model is obviously going to be wrong and thereby producing bad results. 

Comments

Post a Comment

Thanks for your comment. It will be posted after checks.

Popular posts from this blog

ETL developer vs Data engineer

3 years of IBI