to introduction, in this article I am going to discuss “Data
Preprocessing” an important step in the knowledge discovery process, can
be even considered as a fundamental building block of data mining.
People who come from data warehousing background may already be familiar
with the term ETL ( Stands for Extraction,Transformation and Loading).
Any data mining or data warehousing effort's success is dependent on how
good the ETL is
performed. DP ( I am going to refer Data preprocessing as DP henceforth)
is a part of ETL, its nothing but transforming the data. To be more
precise modifying the source data in to a different format which (i) enables data mining algorithms to be applied easily (ii) improves the effectiveness and the performance of the mining algorithms (iii) represents the data in easily understandable format for both humans and machines (iv) supports faster data retrieval from databases (v) makes the data suitable for a specific analysis to be performed.