Okan Demirmen <okan@demirmen.com>.
This module returns an object that works like a Hash, Array, Scalar,
Code and Glob object at the same time.
help and ok sturm@
These are XSL stylesheets for transforming DocBook XML document
instances into various output formats. (They would also work for
transforming DocBook SGML document instances, modulo certain namecase
problems and the fact that there aren't (yet) any XSL implementations
that work with SGML source documents.)
From Bernd Ahlers <b.ahlers@ba-net.org>
help & ok aanriot@
--
OCaml-RSS is a library to parse RSS 2.0 files (which are XML files)
and build a structure representing the document. Some functions
are also provided to print RSS documents from the structure.
The parser tries also to parse some RDF files, but many fields of
these RDF files are not taken into account.
--
Xml Light is a minimal Xml parser & printer for OCaml. It provides
a few functions to parse a basic Xml document into an OCaml data
structure and to print back the data structures to an Xml document.
Xml Light has also support for DTD (Document Type Definition).
PyRTF is a pure python module for the efficient generation of rich text
format documents. It has good support for tables and tries to maintain
compatibility with as many RTF readers as possible.
Docutils is a set of tools for processing plaintext documentation into
useful formats, such as HTML, XML, and LaTeX. Includes reStructuredText,
the easy to read, easy to use, what-you-see-is-what-you-get plaintext
markup language.
From Ben Lovett <ben@tilderoot.com>
This is a package of a XML::Parser style and
generic classes for easily parsing XML documents into
native object-oriented perl form.
from Sam Smith <s at msmith.net>
This is a rather incomplete implementation of work done by Gudrun Putze-Meier
<gudrun.pm@t-online.de>. I have to confess that I never read her original
paper. So all credit belongs to her, all bugs are mine. I tried to get some
insight from an implementation of two students of mine. They remain anonymous
because their work was the wost piece of code I ever saw. My code behaves
mostly as their implementation did except it is about 75 times faster.
txt2tags is a format conversion tool written in Python that generates
HTML, XHTML, SGML, LaTeX, Lout, Man Page, MoinMoin, Magic Point and
PageMaker documents from a single text file with minimal markup.
From Gleydson Soares <mail@gsoares.org>
help & ok xsa@
Sort::Versions allows easy sorting of mixed non-numeric and numeric strings,
like the 'version numbers' that many shared library systems and revision
control packages use. This is quite useful if you are trying to deal with
shared libraries. It can also be applied to applications that intersperse
variable-width numeric fields within text. Other applications can
undoubtedly be found.
The cElementTree module is a C implementation of the ElementTree API. On
typical documents, it's 15-20 times faster than the Python version of
ElementTree, and uses 2-5 times less memory.
looks good & ok xsa@, mbalmer@
The ElementTree type is a simple but flexible container object, designed
to store hierarchical data structures, such as simplified XML infosets,
in memory. The element type can be described as a cross between a Python
list and a Python dictionary.
looks good & ok xsa@, mbalmer@
restrictions. This will remain broken until either the author of Lingua::Stem
removes the dependency on Text::German or the Text::German author rereleases
Text::German under an appropriate license.
The stem function takes a scalar as a parameter and stems the word
according to Martin Porters Danish stemming algorithm, which can be
found at the Snowball website: <http://snowball.tartarus.org/>.
The stem function takes a scalar as a parameter and stems the word
according to Martin Porters Danish stemming algorithm, which can be
found at the Snowball website: <http://snowball.tartarus.org/>.
This module implements a Portuguese stemming algorithm proposed in the
paper A Stemming Algorithm for the Portuguese Language by Moreira, V.
and Huyck, C.
This module provides simple word wrapping. It breaks long lines, but
does not alter spacing or remove existing line breaks. If you're
looking for more sophisticated text formatting, try the Text::Format
module.
In short, Text::Wrapper is the object-oriented equivalent of Text::Wrap,
but with fewer bugs (I hope).
Text::Quoted examines the structure of some text which
may contain multiple different levels of quoting, and
turns the text into a nested data structure.
By default, this module exports a single hash (`%RE') that stores or
generates commonly needed regular expressions. Patterns currently
provided include:
* balanced parentheses and brackets
* delimited text (with escapes)
* integers and floating-point numbers in any base (up to 36)
* comments in C, C++, Perl, and shell
* offensive language
* lists of any pattern
* IPv4 addresses
XML::RAI is an object-oriented layer that maps
overlapping and alternate tags in RSS to one common
simplified interface.
from Sam Smith <s at msmith.net>
XML::RSS::Parser is a lightweight liberal parser of RSS
feeds. This parser is "liberal" in that it does not
demand compliance of a specific RSS version and will
attempt to gracefully handle tags it does not expect or
understand.
from Sam Smith <s at msmith.net>
Universal Feed Parser is a Python module for downloading and parsing
syndicated feeds. It can handle RSS 0.90, Netscape RSS 0.91, Userland
RSS 0.91, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom, and CDF
feeds.
Universal Feed Parser is easy to use; the module is self-contained in a
single file, feedparser.py, and it has only one public function, parse.
parse takes a number of arguments, but only one is required, and it can
be a URL, a local filename, or a raw string containing feed data in any
format.
Multiple integer overflow issues affecting xpdf.
These can result in writing an arbitrary byte to an attacker controlled
location which probably could lead to arbitrary code execution.
CAN-2004-0888
Multiple integer overflow issues.
These can result in DoS or possibly arbitrary code execution.
CAN-2004-0889
Chris also discovered issues with infinite loop logic error.