English 中文(简体)
Force python mechanize/urllib2 to only use A requests?
原标题:

Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests

Basically, given this simple code:

#!/usr/bin/python
import urllib2
print urllib2.urlopen( http://python.org/ ).read(100)

This results in wireshark saying the following:

  0.000000  10.102.0.79 -> 8.8.8.8      DNS Standard query A python.org
  0.000023  10.102.0.79 -> 8.8.8.8      DNS Standard query AAAA python.org
  0.005369      8.8.8.8 -> 10.102.0.79  DNS Standard query response A 82.94.164.162
  5.004494  10.102.0.79 -> 8.8.8.8      DNS Standard query A python.org
  5.010540      8.8.8.8 -> 10.102.0.79  DNS Standard query response A 82.94.164.162
  5.010599  10.102.0.79 -> 8.8.8.8      DNS Standard query AAAA python.org
  5.015832      8.8.8.8 -> 10.102.0.79  DNS Standard query response AAAA 2001:888:2000:d::a2

That s a 5 second delay!

I don t have IPv6 enabled anywhere in my system (gentoo compiled with USE=-ipv6) so I don t think that python has any reason to even try an IPv6 lookup.

The above referenced question suggested explicitly setting the socket type to AF_INET which sounds great. I have no idea how to force urllib or mechanize to use any sockets that I create though.

EDIT: I know that the AAAA queries are the issue because other apps had the delay as well and as soon as I recompiled with ipv6 disabled, the problem went away... except for in python which still performs the AAAA requests.

问题回答

Suffering from the same problem, here is an ugly hack (use at your own risk..) based on the information given by J.J. .

This basically forces the family parameter of socket.getaddrinfo(..) to socket.AF_INET instead of using socket.AF_UNSPEC (zero, which is what seems to be used in socket.create_connection), not only for calls from urllib2 but should do it for all calls to socket.getaddrinfo(..):

#--------------------
# do this once at program startup
#--------------------
import socket
origGetAddrInfo = socket.getaddrinfo

def getAddrInfoWrapper(host, port, family=0, socktype=0, proto=0, flags=0):
    return origGetAddrInfo(host, port, socket.AF_INET, socktype, proto, flags)

# replace the original socket.getaddrinfo by our version
socket.getaddrinfo = getAddrInfoWrapper

#--------------------
import urllib2

print urllib2.urlopen("http://python.org/").read(100)

This works for me at least in this simple case.

No answer, but a few datapoints. The DNS resolution appears to be originating from httplib.py in HTTPConnection.connect() (line 670 on my python 2.5.4 stdlib)

The code flow is roughly:

for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM):
    af, socktype, proto, canonname, sa = res
    self.sock = socket.socket(af, socktype, proto)
    try:
        self.sock.connect(sa)
    except socket.error, msg: 
        continue
    break

A few comments on what s going on:

  • the third argument to socket.getaddrinfo() limits the socket families -- i.e., IPv4 vs. IPv6. Passing zero returns all families. Zero is hardcoded into the stdlib.

  • passing a hostname into getaddrinfo() will cause name resolution -- on my OS X box with IPv6 enabled, both A and AAAA records go out, both answers come right back and both are returned.

  • the rest of the connect loop tries each returned address until one succeeds

For example:

>>> socket.getaddrinfo("python.org", 80, 0, socket.SOCK_STREAM)
[
 (30, 1, 6,   , ( 2001:888:2000:d::a2 , 80, 0, 0)), 
 ( 2, 1, 6,   , ( 82.94.164.162 , 80))
]
>>> help(socket.getaddrinfo)
getaddrinfo(...)
    getaddrinfo(host, port [, family, socktype, proto, flags])
        -> list of (family, socktype, proto, canonname, sockaddr)

Some guesses:

  • Since the socket family in getaddrinfo() is hardcoded to zero, you won t be able to override the A vs. AAAA records through some supported API interface in urllib. Unless mechanize does their own name resolution for some other reason, mechanize can t either. From the construct of the connect loop, this is By Design.

  • python s socket module is a thin wrapper around the POSIX socket APIs; I expect they re resolving every family available & configured on the system. Double-check Gentoo s IPv6 configuration.

The DNS server 8.8.8.8 (Google DNS) replies immediately when asked about the AAAA of python.org. Therefore, the fact we do not see this reply in the trace you post probably indicate that this packet did not come back (which happens with UDP). If this loss is random, it is normal. If it is systematic, it means there is a problem in your network setup, may be a broken firewall which prevents the first AAAA reply to come back.

The 5-second delay comes from your stub resolver. In that case, if it is random, it is probably bad luck, but not related to IPv6, the reply for the A record could have failed as well.

Disabling IPv6 seems a very strange move, only two years before the last IPv4 address is distributed!

% dig @8.8.8.8  AAAA python.org

; <<>> DiG 9.5.1-P3 <<>> @8.8.8.8 AAAA python.org
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50323
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;python.org.                    IN      AAAA

;; ANSWER SECTION:
python.org.             69917   IN      AAAA    2001:888:2000:d::a2

;; Query time: 36 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sat Jan  9 21:51:14 2010
;; MSG SIZE  rcvd: 67

Most likely cause of this is a broken egress firewall. Juniper firewalls can cause this, for instance, though they have a workaround available.

If you can t get your network admins to fix the firewall, you can try the host-based workaround. Add this line to your /etc/resolv.conf:

options single-request-reopen

The man page explains it well:

The resolver uses the same socket for the A and AAAA requests. Some hardware mistakenly only sends back one reply. When that happens the client sytem will sit and wait for the second reply. Turning this option on changes this behavior so that if two requests from the same port are not handled correctly it will close the socket and open a new one before sending the second request.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签