2000-09-29 06:50:53 -04:00
|
|
|
This is a collection of modules that parse and extract information
|
|
|
|
from HTML documents. Bug reports and discussions about these modules
|
|
|
|
can be sent to the <libwww@perl.org> mailing list. Remember to
|
|
|
|
also look at the HTML-Tree package that creates and extracts
|
|
|
|
information from HTML syntax trees.
|
|
|
|
|
|
|
|
The modules present in this collection are:
|
|
|
|
|
|
|
|
HTML::Parser - The parser base class. It receives arbitrary sized
|
|
|
|
chunks of the HTML text, recognizes markup elements, and
|
|
|
|
separates them from the plain text. As different kinds of
|
|
|
|
markup and text are recognized, the corresponding event
|
|
|
|
handlers are invoked.
|
|
|
|
|
|
|
|
HTML::Entities - Provides functions to encode and decode text
|
|
|
|
with embedded HTML >entities>.
|
|
|
|
|
|
|
|
HTML::HeadParser - A lightweight HTML::Parser subclass that
|
2005-03-14 20:57:02 -05:00
|
|
|
extracts information from the <HEAD> section of an HTML document.
|
2000-09-29 06:50:53 -04:00
|
|
|
|
|
|
|
HTML::LinkExtor - An HTML::Parser subclass that extracts links
|
|
|
|
from an HTML document.
|
|
|
|
|
|
|
|
HTML::TokeParser - An alternative interface to the basic parser
|
|
|
|
that does not require event driven programming. Most simple
|
|
|
|
parsing needs are probably best attacked with this module.
|