Projects / Microsoft Word 2002 Unmunger

Microsoft Word 2002 Unmunger

The Word Unmunger is a small Python program which removes much of the HTML cruft produced by Microsoft Word 2002 (Word version 10), making the files much easier to edit by hand. It removes XML namespace declarations, smart tags, meta tags, HTML comments, style sheets, DIVs, the Microsoft Office file list, CSS classes, and Microsoft Office grammar and spelling error markers.


Recent releases

  •  11 Mar 2003 09:11

    Release Notes: The program no crashes on larger documents due to limitations in Python's default regular expression implementation (sre). The pre implementation is now used instead. A debug mode that prints regular expressions as they're used was added, along with more robust handling of command line arguments.

    •  22 Dec 2002 06:53

      Release Notes: Based on a request from a user, Word Unmunger now features a batch mode for automatic processing of several files at once. The code has also been cleaned up to allow new unmunging rules to be added more easily.

      •  01 Dec 2002 20:43

        Release Notes: This release adds a new filter for files exported from Word X for Macintosh. Word X puts in a large number of <![ ... ]> tags for conditionals. These are now removed.


        Project Spotlight


        A Fluent OpenStack client API for Java.


        Project Spotlight

        TurnKey TWiki Appliance

        A TWiki appliance that is easy to use and lightweight.