Projects / GNU libextractor

GNU libextractor

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.

Operating Systems

Recent releases

  •  22 Dec 2013 22:51

    Release Notes: This release adds a plugin for extracting audio previews and ensures that one blocking (or slow) plugin does not prevent other plugins from progressing.

    •  19 Oct 2013 14:38

      Release Notes: This release fixes (silent) IPC issues on slow machines which previously triggered a timeout resulting in incomplete meta data extraction. It also updates the Dutch translation, requires an external installation of libltdl, and fixes build issues with recent versions of libavcodec and libtidy.

      •  06 Oct 2012 13:31

        Release Notes: This release fixes plugin discovery on OS X. Nothing has changed on other systems, so there is no need to update from 1.0.0 on non-OS X systems.

        •  25 Sep 2012 15:13

          Release Notes: Major changes to the plugin mechanism now allow out-of-process plugins full random access to the entire file. Most plugins have been rewritten to the new plugin API. The external (libextractor) API remains unchanged and compatible with 0.6. As part of the rewrite, many plugins were changed to use standard 3rd party libraries (libjpeg, libtiff, libgif, libtidy, and libmagic) for parsing. A new plugin based on gstreamer replaces many existing multimedia plugins. Automated test cases for (almost all) of the plugins were also written, and the documentation was updated.

          •  28 Nov 2011 11:59

            Release Notes: This release adds support for Matroska, fixes some minor bugs (leaks on error-handling paths), and does some minor code clean up (fixing compiler warnings about dead code).

            Recent comments

            02 Feb 2008 05:00 grothoff

            Re: online demo not working
            There are two PDF plugins, one that is quite

            simplistic and another one based on code from

            xpdf (which has a bad security track record).

            Depending on which one I happen to enable on the

            website (options to configure), you get more or

            less information for PDF files.

            > When I upload dmca.pdf all it gives me

            > is mimetype. Am I missing something?

            24 Jan 2008 15:40 baloney

            online demo not working
            When I upload dmca.pdf all it gives me is mimetype. Am I missing something?

            14 Aug 2005 21:25 grothoff

            Re: Also Requires gobject-2.0
            Note that as of 0.5.3 LE still needs gobject-2.0 but the
            ordinary shared version will do fine now.

            27 Jan 2005 10:15 grothoff

            Re: Also Requires gobject-2.0
            Well, gobject-2.0 is part of glib, so it is listed as a
            dependency. What is more tricky is that we need the
            static, relocatable version of the library -- but try to specify
            that on freshmeat :-).

            27 Jan 2005 10:07 dforce

            Also Requires gobject-2.0
            Can't seem to get the OLE2 libraries to compile, make complains:

            /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.3/../../../../i686-pc-linux-gnu/bin/ld: cannot find -lgobject-2.0

            Oh, and you may want to include these dependencies within either the README or INSTALL files.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.