ETL Testing

Usually in data warehousing/business intelligence projects, ETL (Extract, Transform, Load) tools such as Informatica, Datastage, Talend, etc., are used to design ETL Jobs. Some companies still use scripting to develop ETL jobs.

ETL testing can be as simple as

Testing an ETL job manually by running it from a GUI verifying that the job runs, and validating the data loaded against expected data.

or can be complex like

Automating test data creation using scripting, SQL and tools like SOAP UI, using tools like Jenkins or ETL tools itself to trigger automated test data creation scripts and then automatically run the actual ETL job and then automated comparison of the loaded data against baselined data and highlight anomalies (if any) and auto check data against expected results for every single test case that was created earlier. Integrate the test scripts and test data creation scripts with the full set of ETL jobs so that it can be run during regression testing of the ETL Jobs when one of the existing jobs changes. Also includes performance testing (volumes, run times, impact on source systems).

In my view, the job market for an ETL tester is very small compared to a Job market for an ETL developer. Many projects do not have specific role of ETL tester. These projects usually have a end to end (e.g. reports, portal, DB and ETL) tester rather than specific to ETL, or the developers themselves review and test each others' code. So the number of openings specific to ETL testing will be less and it could reduce further with increasing test automation. I also think that it's not good to limit a profile to only ETL testing. Manual ETL testing is usually carried out by offshore based IT companies that take advantage of cheap labour. The Manual testing of ETL jobs and reports has to stop. Those into writing scripts for automation of ETL and report testing and automated test data setup will be valued more and have better future compared to manual testers.

Capabilities of the ETL tool itself could be used to create separate set of jobs (test project) to carry out data comparisons, validations and write results to file or DB. If you use a reporting tool you could analyze the data with that.


Popular posts from this blog

ETL developer vs Data engineer

KABI - The new Agile Methodology for BI Projects - Implement BI projects quicker happily

BI Architect course and BI Tool question