ClodHopper is a Java library for high-performance clustering of numerical data. It contains clustering implementations such as K-Means, K-Means++, X-Means, G-Means, Fuzzy C-Means, Jarvis-Patrick, and various forms of hierarchical clustering. ClodHopper's clustering implementations take advantage of the host system's concurrent processing ability to speed clustering. The data structures are also very lean to conserve memory usage. ClodHopper is very extensible. If you are developing a new clustering algorithm, you may save yourself an enormous amount of work by extending a ClodHopper base class.
MyMediaLite is a lightweight, multi-purpose library of recommender system algorithms. It addresses the two most common scenarios in collaborative filtering: rating prediction (e.g. on a scale of 1 to 5 stars), and item prediction from implicit feedback (e.g. from clicks or purchase actions). It contains dozens of recommender engines, including state-of-the-art matrix factorization methods. It also supports real-time updates to the recommender engines, storing engines to disk and reloading them again, and several evaluation measures to compare the accuracy of different recommender system methods. Three command-line programs that offer most of the functionality contained in the library are included.
allmon is a generic system for collecting and storing various runtime metrics collections used for system performance, health, quality, and availability monitoring purposes. The system also provides a set of data-mining algorithms useful for further performance analysis. Allmon is designed to harvest different metrics values coming from many areas of monitoring infrastructure. The collected data are based on quantitative and qualitative performance and availability analysis. Allmon collaborates with other analytical tools for OLAP multidimensional analysis and data mining processing. The tool can be used for production as well as for development (profiling) and QA (load testing) purposes.
MAPDAV (More Accurate Password Dictionary Attack Vector) is designed to use what is known about users via the /etc/passwd file on Unix/Linux systems to generate a dynamic dictionary of more accurate guesses as to what their possible password may be. It does this by mangling the user's username and user information in various user-specified ways to look for bad password protection practices.