Hpricot is a fast, flexible HTML parser written in C. It's designed
to be very accommodating (like Tanaka Akira's HTree) and to have a
very helpful library (like some JavaScript libs -- JQuery, Prototype
-- give you.) The XPath and CSS parser, in fact, is based on John
Resig's JQuery.
Also, Hpricot can be handy for reading broken XML files, since many of
the same techniques can be used. If a quote is missing, Hpricot tries
to figure it out. If tags overlap, Hpricot works on sorting them out.