From: Christopher Zimmermann <madroach@gmerlin.de>
--
Xmlm is an OCaml module for streaming XML IO. It aims at making XML
processing robust and painless. The streaming interface can process
documents without building an in-memory representation. It lets the
programmer translate its data structures to XML documents and
vice-versa. Functions are provided to easily transform arborescent
data structures to/from XML documents.
From Christopher Zimmermann <madroach@gmerlin.de>
--
OCaml-Text is a library for dealing with ``text'', i.e. sequence of
unicode characters, in a convenient way
It supports:
- character encoding/decoding using iconv
- manipulation of text as UTF-8 encoded strings
- localised text functions such as compare, upper, ...
- human readable regular expression inside patterns
* new MASTER_SITES and HOMEPAGE
* use new PROPERTY ocaml_native
* patch to support install on bytecode-only arch
From: Christopher Zimmermann <madroach@gmerlin.de>
Description:
cloc counts blank lines, comment lines, and physical lines of source
code in many programming languages. Given two versions of a code base,
cloc can compute differences in blank, comment, and source lines.
ok edd@ gonzalo@
and Chris Bennett. Earlier version OK landry@
TeX::Encode exports the function 'latex_encode' which encodes
characters in a string, that would be incorrectly interpreted by
LaTeX.
and Chris Bennett.
The LaTeX::Driver module takes care of running and re-running latex on
a LaTeX document so that forward references, tables of contents, and
lists of figures and tables are resolved. It will also run bibtex and
makeindex if it detects that a bibliography or in index have been
specified, and will re-run latex again one or more times until the
formatting of the document has stabilized.
landry@ ok to import (though slight tweaks since the version he saw).
Don't redirect errors to /dev/null and don't return true(1)
unconditionally. Instead, don't check for the existence of index.theme.
This will allow us to catch errors that may be happening because of a
missing dependency in the chain.
Some hidden issues may appear, in which case please contact me.
discussed with and ok blind jasper@
do not print "OpenBSD 5.0" but "OpenBSD ports".
While here, remove some noise from the groff build log.
Bump the groff package.
Technically, this changes the contents of all packages that USE_GROFF,
but please refrain from bumping the world: Having "OpenBSD 5.0" in
the footers of some ports manual until they are updated the next time,
or until the next libc bump if they aren't, is not a real problem.
string "OpenBSD ports" suggested and patch ok'ed by sthen@
doesn't depend on anything, so it wouldn't get automatically updated so an
old package built with a pkg_create which used @ignore annotations wouldn't
have been replaced. pkg_add warning reported by kettenis@.
* the \r character was not handled correctly
* Added support for flexible tabize wished
* some highlighting mistakes were introduced by the last
bugfix.
Tested on i386. While here USE_GROFF is not needed.
OK okan@, aja@
GtkSpell provides word-processor-style highlighting and replacement of
misspelled words in a GtkTextView widget. Right-clicking a misspelled
word pops up a menu of suggested replacements.
ok jasper@
GCC hates and uses 800MB+ to compile, and embedding with .incbin.
Switch the port to using .incbin. Fixes out of memory on alpha reported
and tested by naddy@, greatly improves build time on arm.
The VMEM_WARNING can now be removed.
- while there, don't use groff.
Latexmk is a perl script for running LaTeX the correct number of times
to resolve cross references, etc; it also runs auxiliary programs
(bibtex, makeindex if necessary, and dvips and/or a previewer as
requested).
<...>
ok jasper@
This module provides a utility method, "to_identifier" for converting
an arbitrary string into a readable representation using the ASCII
subset of "\w" for use as an identifier in a computer program. The
intent is to make unique identifier names from which the content
of the original string can be easily inferred by a human just by
reading the identifier.
If you need the full set of "\w" including Unicode, see the subclass
String::ToIdentifier::EN::Unicode.
Currently, this process is one way only, and will likely remain
this way.
The default is to create camelCase identifiers, or you may pass in
a separator char of your choice such as "_".
Binary char groups will be separated by "_" even in camelCase
identifiers to make them easier to read, e.g.: "foo_2_0xFF_Bar".
The module is a probability based, corpus-trained tagger that assigns
POS tags to English text based on a lookup dictionary and a set of
probability values. The tagger assigns appropriate tags based on
conditional probabilities - it examines the preceding tag to determine
the appropriate tag for the current word. Unknown words are classified
according to word morphology or can be set to be treated as nouns
or other parts of speech. The tagger also extracts as many nouns
and noun phrases as it can, using a set of regular expressions.
The exportable subroutines of Lingua::EN::Inflect provide
plural inflections, "a"/"an" selection for English words,
and manipulation of numbers as words.