lxml.html
Whilst playing around with lxml, I came across a problem with some code in xdv that was causing problems with <script> and <style> tags. The contents of these tags was being escaped, and hence all being broken.
Turns out xdv was using lxml.etree.tostring and should have been using lxml.html.tostring.
The latter does not escape certain HTML tags (and probabaly does other things too). The nice thing I found looking at the lxml.html docs are some really handy methods for dealing with HTML, e.g.:
.iterlinks() which gives an iterator over the document that returns all links:
This yields (element, attribute, link, pos) for every link in the document. attribute may be None if the link is in the text (as will be the case with a <style> tag with @import).
This finds any link in an action, archive, background, cite, classid, codebase, data, href, longdesc, profile, src, usemap, dynsrc, or lowsrc attribute. It also searches style attributes for url(link), and <style> tags for @import and url().
More info can be found on the lxml.html page.

