Catdoc is a MS Word file decoding tool that doesn't attempt to analyze file formatting (it just extracts readable text), but is able to handle all versions of Word and convert character encodings. A Tcl/Tk graphical viewer is also included. It can also read RTF files and convert Excel and PowerPoint files.
| Tags | Text Processing |
|---|---|
| Licenses | GPL |
| Operating Systems | Linux (32 and 64 bit) BSD |
| Implementation | C |
Catdoc has a new maintainer, Nick Bane, and is now hosted as part of the Alioth project supported by Debian. New VCS content will appear in due course. Bugs should be filed via the Debian Bug Tracking System. Pending changes include a rewrite of the build system to implement a more standardised build. catdoc is a stable codebase, large functional changes are not likely.


Release Notes: This release fixes codepage and charset bugs, handles negative numbers on 64bit architectures, and fixes a Macintosh MS1904 date bug in xlsparse.


Release Notes: A catppt utility for viewing PowerPoint files was added. Processing of Mac charsets and dates was improved.


Release Notes: A lot of bugs concerning the RTF parser and xls3csv have been fixed. The ability to define a customizable page separator for multi-page spreadsheets and command line switch to specify desired maximal precision of floating point numbers (the default now is to output as many digits as it is) have been added. A bug with reading pre-OLE word/write files and text files (Debian bug #255625) has been fixed.