a (pre-)processing set of techniques that makes respect certain correctness and completeness rules needed to apply intelligent data analysis techniques; the main categories include eliminating duplicates and other sources of error and filling incomplete or missing fields from the data.