Projects / DataCleaner

DataCleaner

DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.

Tags
Licenses
Implementation

Last announcement

Community contributor contest 08 Nov 2012 14:20

Recent releases

  •  21 May 2014 12:51

    Release Notes: A new major feature, duplicate detection, allows you to fuzzy find duplicate records in your data. A new analyzer for checking referential integrity between tables of multiple sources. Progress Indication has been improved and is more responsive.

    •  15 Mar 2014 03:50

      Release Notes: You can now compose jobs so that a DataCleaner job actually calls/invokes another "child" job as a single transformation. Source column handling was improved, and the user can now choose which columns to include in a source query. Repository file locking was implemented to prevent concurrent reads and writes.

      •  24 Sep 2013 13:16

        Release Notes: The 'Synonym lookup' transformation now has an option to look up every token of the input. This is useful if you're doing replacement of synonyms within the values of a long text field. A potential failure was fixed when blocking execution of DataCleaner jobs through the monitor's Web service. An improvement was made in the way jobs and the sequence of components are closed / cleaned up after execution. The Java WebStart version of DataCleaner was exposed by a bug in the Java runtime causing certain JAR files not to be recognized by the WebStart launcher under certain circumstances.

        •  05 Sep 2013 12:39

          Release Notes: It is now possible to hide output columns of transformations. Hiding will not affect the processing flow, but simply hide them from the user interface, potentially making the experience cleaner when interacting with other components. A new Web service has been added to the monitoring Web application which provides a way to poll the status of the execution of a particular job. A bug has been fixed which caused the HTML report to fail for certain analysis types when no records had been processed. Six other minor bugs have been addressed.

          •  12 Jun 2013 07:55

            Release Notes: This release adds a new filter for performing Change Data Capture, makes execution of jobs queued to avoid concurrent execution issues, and adds several minor bugfixes and improvements.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.