0925c3a284
ok mbalmer@ "that diff was fun to read"
27 lines
1.1 KiB
Plaintext
27 lines
1.1 KiB
Plaintext
This is a collection of modules that parse and extract information
|
|
from HTML documents. Bug reports and discussions about these modules
|
|
can be sent to the <libwww@perl.org> mailing list. Remember to
|
|
also look at the HTML-Tree package that creates and extracts
|
|
information from HTML syntax trees.
|
|
|
|
The modules present in this collection are:
|
|
|
|
HTML::Parser - The parser base class. It receives arbitrary sized
|
|
chunks of the HTML text, recognizes markup elements, and
|
|
separates them from the plain text. As different kinds of
|
|
markup and text are recognized, the corresponding event
|
|
handlers are invoked.
|
|
|
|
HTML::Entities - Provides functions to encode and decode text
|
|
with embedded HTML >entities>.
|
|
|
|
HTML::HeadParser - A lightweight HTML::Parser subclass that
|
|
extracts information from the <HEAD> section of an HTML document.
|
|
|
|
HTML::LinkExtor - An HTML::Parser subclass that extracts links
|
|
from an HTML document.
|
|
|
|
HTML::TokeParser - An alternative interface to the basic parser
|
|
that does not require event driven programming. Most simple
|
|
parsing needs are probably best attacked with this module.
|