Projects / webcheck


webcheck is a Web site checking tool for Web masters. It crawls a given Web site and generates a number of reports. The whole system is pluggable, allowing extra reports and checks to be added easily. It supports retrieving Web sites over HTTP, file, and FTP protocols and produces reports on site structure, broken links, old Web pages, overviews of external links, and more. The links that webcheck considers external are configurable through regular expressions, and webcheck honors robots.txt.


Recent releases

  •  11 Sep 2010 21:56

    Release Notes: This is a maintenance release that gathers some outstanding fixes. Also, it removes some unnecessary debugging code, limits the "referenced from" list to 10 items, adds a Referrer header if possible and has some Debian packaging improvements.

    •  19 Jul 2008 23:11

      Release Notes: This release adds fixes for a number of small problems in the 1.10.2 release. It also adds parsing of the iframe and script tags and style attributes. It also implements calling tidy when it is available on the system.

      •  04 Nov 2007 11:55

        Release Notes: This minor update includes checking for a bug in some versions of BeautifulSoup. Support for running on Python 2.3 was added again. Small documentation improvements and Debian package improvements were made.

        •  15 Jul 2007 15:42

          Release Notes: This release includes some big performance improvements (especially for very large sites). Fixes were made for a problem when using --continue with some messages with non-ASCII characters and a crash with some zero-size pages. webcheck now also parses the http-equiv meta header refresh option.

          •  13 May 2007 06:43

            Release Notes: This release changes the HTML parser to BeautifulSoup (when available). This parser is much more error-tolerant than the old HTMLParser based solution but is also slightly slower. Some small output improvements were made as well as some internal improvements to better support Unicode content. Parsing of robots.txt files was re-enabled and an --ignore-robots option was added.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.