Projects / HTML Parser

HTML Parser

HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust, and well-tested package.


Recent releases

  •  25 Sep 2006 06:01

    Release Notes: the license has been changed to the CPL. Maven2 is now used as the build environment. Subversion is used for the source repository. A new Web site was created. <<tag> is now correctly parsed as text. A method to render the start of a tag in HTML was added. CssSelectorNodeFilter does not accept [attr|=val].

    •  11 Jun 2006 04:50

      Release Notes: Support was added for commonly requested composite tags. Several enhancements were made to the filtering functionality. Additions were made to the HTTP connection processing subsystem. Other user-requested features and bugfixes were made.

      •  28 May 2006 07:39

        Release Notes: This is the first candidate for the final 1.6 release. All outstanding bugs have been fixed. A new XorFilter rounds out the logical node filters.

        •  20 Mar 2006 04:22

          Release Notes: NodeTreeWalker, a utility class to traverse a tree of Node objects using either depth-first or breadth-first tree order, has been added. Several other bugfixes and patches have been incorporated.

          •  12 Nov 2005 18:52

            Release Notes: Support has been added for commonly requested composite tags, P, H1-H6, and definition list tags (DL, DT, DD). The node interface has been augmented with get first/last child and get previous/next sibling methods to ease traversing the HTML document.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.