Question

Hey I have just started to use Python recently and I want to use it with a bit of xPath, the thing is when I print the result of the query I only get [] and I don t know why =S

    import libxml2, urllib


doc = libxml2.parseDoc(urllib.urlopen("http://www.domain.com/").read())
result = doc.xpathEval("//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")

if result != []:
    print result
elif result == "":
    print "null"
else:
    print result

doc.freeDoc()

I get no error whatsoever just a []. What could it be? also is there any better documentation for libxml2 than the one here since I find it reaaaally confusing =S

Edit

I changed the code, so now I get more than the [] I get the following output which should be related to the non-validity of the html I m trying to parse (but it s not mine so I can t modify it). Any ideas on to how to tell Python to be more forgiving with that fact?

^ Entity: line 3552: parser error : Premature end of data in tag tr line 209

^ Entity: line 3552: parser error : Premature end of data in tag tbody line 208

^ Entity: line 3552: parser error : Premature end of data in tag table line 207

^ Entity: line 3552: parser error : Premature end of data in tag input line 206

^ Entity: line 3552: parser error : Premature end of data in tag input line 205

^ Entity: line 3552: parser error : Premature end of data in tag form line 204

^ Entity: line 3552: parser error : Premature end of data in tag table line 99

^ Entity: line 3552: parser error : Premature end of data in tag div line 98

^ Entity: line 3552: parser error : Premature end of data in tag body line 96

^ Entity: line 3552: parser error : Premature end of data in tag html line 3

^ Traceback (most recent call last): File "C:Python26libsite-packageslibxml2.py", line 1263, in parseDoc if ret is None:raise parserError( xmlParseDoc() failed ) libxml2.parserError: xmlParseDoc() failed

It s actually a longer list but there s no point in placing it all here, since all errors are due to invalid html.

Answer 1

It could be that your XPath doesn t select any elements. For example, you are looking for td s inside th s, but those elements are peers, and shouldn t nest.

Why do you say (count(preceding-sibling::*) + 1) = 2 instead of count(preceding-sibling::*) = 1?

If you use a simpler XPath, do you get the results you expect?

Answer 2

Are you confusing th and tr? Change your th to tr.

Answer 3

Side note: Where does all that unnecessary complexity in your XPath come from? This:

//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a

is equivalent to:

//th//td[count(preceding-sibling::*) = 1)]//a

and very probably even to:

//th/td[2]//a

友情链接