English 中文(简体)
Python XPath Result displaying only []
原标题:

Hey I have just started to use Python recently and I want to use it with a bit of xPath, the thing is when I print the result of the query I only get [] and I don t know why =S

    import libxml2, urllib


doc = libxml2.parseDoc(urllib.urlopen("http://www.domain.com/").read())
result = doc.xpathEval("//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")

if result != []:
    print result
elif result == "":
    print "null"
else:
    print result

doc.freeDoc()

I get no error whatsoever just a []. What could it be? also is there any better documentation for libxml2 than the one here since I find it reaaaally confusing =S


Edit

I changed the code, so now I get more than the [] I get the following output which should be related to the non-validity of the html I m trying to parse (but it s not mine so I can t modify it). Any ideas on to how to tell Python to be more forgiving with that fact?

^ Entity: line 3552: parser error : Premature end of data in tag tr line 209

^ Entity: line 3552: parser error : Premature end of data in tag tbody line 208

^ Entity: line 3552: parser error : Premature end of data in tag table line 207

^ Entity: line 3552: parser error : Premature end of data in tag input line 206

^ Entity: line 3552: parser error : Premature end of data in tag input line 205

^ Entity: line 3552: parser error : Premature end of data in tag form line 204

^ Entity: line 3552: parser error : Premature end of data in tag table line 99

^ Entity: line 3552: parser error : Premature end of data in tag div line 98

^ Entity: line 3552: parser error : Premature end of data in tag body line 96

^ Entity: line 3552: parser error : Premature end of data in tag html line 3

^ Traceback (most recent call last): File "C:Python26libsite-packageslibxml2.py", line 1263, in parseDoc if ret is None:raise parserError( xmlParseDoc() failed ) libxml2.parserError: xmlParseDoc() failed

It s actually a longer list but there s no point in placing it all here, since all errors are due to invalid html.

问题回答

It could be that your XPath doesn t select any elements. For example, you are looking for td s inside th s, but those elements are peers, and shouldn t nest.

Why do you say (count(preceding-sibling::*) + 1) = 2 instead of count(preceding-sibling::*) = 1?

If you use a simpler XPath, do you get the results you expect?

Are you confusing th and tr? Change your th to tr.

Side note: Where does all that unnecessary complexity in your XPath come from? This:

//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a

is equivalent to:

//th//td[count(preceding-sibling::*) = 1)]//a

and very probably even to:

//th/td[2]//a




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签