Projects / Hachoir parser

Hachoir parser

Hachoir parser is a collection of parsers for the most common file formats. It was written for the Hachoir framework. It can open archives (7zip, bzip2, gzip, rpm, tar, unix_archive, zip), audio (aiff, itunesdb, midi, mpeg_audio/mp3, real_audio, sun_next_snd), video (asf, flv, mov, mpeg_ts, mpeg_video), audio/video containers (asn1, matroska, ogg/vorbis, ogg/theora, real_media, riff/avi, riff/wav, swf), filesystems (ext2, fat12, fat16, fat32, iso9660, linux_swap, msdos_harddrive, ntfs, reiserfs), game data (lucasarts_font, spiderman_video, zsnes), images (bmp, gif, ico, jpeg, pcx, png, psd, targa, tiff, wmf, xcf), programs (elf, exe, java_class, python), and more.

Operating Systems

Recent releases

  •  03 Sep 2008 14:34

    Release Notes: A FLAC parser, an Action Script parser, and a GNOME keyring parser (which can parse the stored passwords using Python Crypto) were added. The text extension field of GIF is supported and the image content is parsed. The charset of IPTC string was fixed. The parser for TIFF was improved to parse image data, and many tags were added. The charset for summary strings of MS Office documents is now guessed.

    •  11 Jul 2007 23:32

      Release Notes: This release supports OLE2 (Word) documents bigger than 6 MB. It has an improved LNK parser. More subsystem names of PE executables have been added. Supports Python 2.5c2 for PYC. Many spelling mistakes have been fixed.

      •  15 Apr 2007 06:25

        Release Notes: New parsers were added for Microsoft Windows animated icon (.ani), Microsoft's HTML Help (.chm), Windows Shortcut (.lnk), X11 Portable Compiled Font (pcf), Microsoft Archive parser (.mar), and Adobe Portable Document Format (PDF). Many constants are converted to Unicode. The charset is set to ISO-8859-1 for many strings with no charset. The MIME type is now in Unicode. Timestamps are stored as datetime.datetime(). MAC48_Address and NIC24 parser were added. An IEEE 24-bit organizationally unique identifiers list was added.

        •  24 Jan 2007 13:16

          Release Notes: was rewritten and uses distutils by default and doesn't depend on hachoir-core. The ICO parser now supports Windows cursors. Parsers are more fault tolerant since they use the new HACHOIR_ERRORS constant, which lists minor errors to ignore. The magic string for gzip was fixed. Useless exceptions for XCF were removed. The fourcc handler for RIFF was fixed. For FAT, a ValueError is caught when using the string index() method. For ASF, empty fields are not created, and validate() checks the header's minimum size. For EXE, validate() checks size_mod_512 in the MSDOS header, and a method to compute the content size of an MSDOS executable was added.

          •  17 Jan 2007 15:48

            Release Notes: New parsers were added for: 7-zip archives, Audio Interchange File Format (AIFF), Linux swap file, LucasArts Font, New Technology File System (NTFS), Microsoft Enhanced Metafile (EMF), Microsoft Windows Metafile (WMF), Musical Instrument Digital Interface (MIDI), Real Audio (.ra), Real Media (.rm), and Truevision Targa Graphic (TGA) pictures. A method to compute real content size was added. A magic string to find the file start was added. A method to get file extension (file name suffix) was added. A method to choose the best MIME type was added. File validation was improved. Lazy decompression is used for the bzip2 and gzip parsers.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.