Projects / Clubmask Resource Manager

Clubmask Resource Manager

Clubmask is a 'glue' package that combines the outstanding management and speed of the Bproc distributed process layer with the power and configuration of the Maui HPC Scheduler. It uses the Supermon resource monitor to gather node information. This node information is combined with job submission data, and suplied to Maui. Maui issues job start and termination commands which are handled by Clubmask via the Bproc layer. Clubmask also supplies a 'supermon2ganglia' translator that allows supermon data to be displayed in a ganglia Web frontend.


Recent releases

  •  15 Dec 2003 22:58

    Release Notes: Support has been added for a runtime configuration to use ganglia to gather node data. This is added to the current support for supermon. Ganglia is now the preferred subsystem, as it is much more stable.

    •  18 Nov 2003 02:17

      Release Notes: The job names (JOBID) have been changed from absolute timestamps to a more normal "string.number" format, where "string" is an arbitrary job name that defaults to the username, and "number" is the number in the sequence of that partitcular job name. Many options have been added to cmsumbit. A supermon_state daemon that handles node state in supermon has been added. This separates this logic out of resource_manager. There are many more changes.

      •  17 Nov 2003 21:05

        Release Notes: CPU speed gathering was fixed, as only the last node's speed was used for all of the nodes. cmdbrestore has been fixed to restore a singleton tuple. The Supermon recv and revive_nodes methods have been cleaned up. There are many smaller fixes.

        •  29 Jul 2003 17:29

          Release Notes: The main fixes made were to rework the SupermonInterface class by adding a few new classes and splitting up the error handling in a sane fashion. This should make the supermon data retrieval much more stable. Also added is the ability to use either bpsh and/or ssh to each node to really kill a job.

          •  16 Jul 2003 12:24

            Release Notes: The code in the mauichksummodule now makes sure that checksum is null terminated for Py_BuildValue. In ResourceManager::Machine, BprocSupermon is now allowed to find nodes. In BprocSupermon, the logic in findNodes was fixed to make sure that supermon sees all of the nodes that bproc does. This also solves a problem where too much data was returned by each 'findNodes' call. supermon is now only contacted if there are nodes that need to be added. In IdResolv, bug where a node with a leading 0 would mismatch what supermon would assign as a nodeid was fixed.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.