This research will employ MapReduce framework to build a highly parallel ETL for data warehousing in distributed environments. We will make a further step on Thomsen et al.'s code-based ETL, pygrametl, by applying the MapReduce framework in order to make automatic parallelism possible. The research aims to form a programming framework for ETL with high performance requirement, and fast and easy development. Programming examples will be provided, and a case will be used for performance study.
These details are provided for information only. No information here is legal advice and should not be used as such.