Projects / Natural Language Toolkit

Natural Language Toolkit

NLTK, the Natural Language Toolkit, is a suite of Python libraries and programs for symbolic and statistical natural language processing. NLTK includes graphical demonstrations and sample data. It is accompanied by extensive documentation, including tutorials that explain the underlying concepts behind the language processing tasks supported by the toolkit.

Operating Systems

Recent releases

  •  27 Apr 2006 13:48

    No changes have been submitted for this release.

    •  20 Mar 2004 22:09

      Release Notes: Some significant changes were made to NLTK's basic architecture. These changes make the basic processing tasks easier to use, and make it easier to combine different processing tasks into a single system.

      •  05 Nov 2003 10:50

        Release Notes: This version adds four new corpora and corpus readers (the names corpus, stopwords corpus, semcor corpus, and wordnet corpus), adds several new modules in nltk- contrib, splits nltk.token into two modules: nltk.token defines Token and Location, and nltk.tokenizer defines tokenizers, adds many new modules to nltk-contrib, adds a look-ahead window for sequential tagging, and fixes various bugs.

        •  19 Aug 2003 05:45

          Release Notes: This version adds two new packages: nltk-data, a package containing sample datasets, and nltk-contrib, a package containing third party contributions that have not (yet) been incorporated into the toolkit. It also includes significant improvements to the documentation, including new tutorials, revised tutorials, and improved API documentation. It adds a new module that defines a standard interface for stemmers, and implements the Porter stemmer. It also contains several improvements to the graphical demos.

          •  05 Apr 2003 04:28

            Release Notes: An overhaul of nltk.probability was completed. The Tagger module design was updated to allow for better backoff. Many tutorials are new or updated (regexp, tagging, probability, and intro). 2 kinds of chart edges are distinguished: token edges (used to initialize the chart), and production edges. Assorted minor improvements were also made.

            Recent comments

            04 Apr 2003 17:20 brainless

            Tutorials for NLTK

            Several tutorials for NLTK are available at this ( page.

            Come and get 'em !


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.