English 中文(简体)
Parsing date with timezone from an email?
原标题:

I am trying to retrieve date from an email. At first it s easy:

message = email.parser.Parser().parse(file)
date = message[ Date ]
print date

and I receive:

 Mon, 16 Nov 2009 13:32:02 +0100 

But I need a nice datetime object, so I use:

datetime.strptime( Mon, 16 Nov 2009 13:32:02 +0100 ,  %a, %d %b %Y %H:%M:%S %Z )

which raises ValueError, since %Z isn t format for +0100. But I can t find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?

最佳回答

email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.

>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate( Mon, 16 Nov 2009 13:32:02 +0100 )
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)

Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple.

>>> (time.mktime(email.utils.parsedate( Mon, 16 Nov 2009 13:32:02 +0900 )) ==
... time.mktime(email.utils.parsedate( Mon, 16 Nov 2009 13:32:02 +0100 ))
True

So you ll still need to parse out the time zone and take into account the local time difference, too:

>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate( Mon, 16 Nov 2009 13:32:02 +0900 )) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0
问题回答

Use email.utils.parsedate_tz(date):

msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get( date )
if date_str:
    date_tuple=email.utils.parsedate_tz(date_str)
    if date_tuple:
        date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
    ... # valid date found

For python 3.3+ you can use parsedate_to_datetime function:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime( Mon, 16 Nov 2009 13:32:02 +0100 )
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Official documentation:

The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo. New in version 3.3.

In Python 3.3+, email message can parse the headers for you:

import email
import email.policy

headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get( date ).datetime)
# -> 2009-11-16 13:32:02+01:00

Since Python 3.2+, it works if you replace %Z with %z:

>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100", 
...                   "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Or using email package (Python 3.3+):

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

if UTC offset is specified as -0000 then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo set.

To parse rfc 5322 date-time string on earlier Python versions (2.6+):

from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz

ZERO = timedelta(0)
time_string =  Mon, 16 Nov 2009 13:32:02 +0100 
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
#  see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO,  UTC ))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00

where FixedOffset is based on tzinfo subclass from the datetime documentation:

class FixedOffset(tzinfo):
    """Fixed UTC offset: `time = utc_time + utc_offset`."""
    def __init__(self, offset, name=None):
        self.__offset = offset
        if name is None:
            seconds = abs(offset).seconds
            assert abs(offset).days == 0
            hours, seconds = divmod(seconds, 3600)
            if offset < ZERO:
                hours = -hours
            minutes, seconds = divmod(seconds, 60)
            assert seconds == 0
            #NOTE: the last part is to remind about deprecated POSIX
            #  GMT+h timezones that have the opposite sign in the
            #  name; the corresponding numeric value is not used e.g.,
            #  no minutes
            self.__name =  <%+03d%02d>GMT%+d  % (hours, minutes, -hours)
        else:
            self.__name = name
    def utcoffset(self, dt=None):
        return self.__offset
    def tzname(self, dt=None):
        return self.__name
    def dst(self, dt=None):
        return ZERO
    def __repr__(self):
        return  FixedOffset(%r, %r)  % (self.utcoffset(), self.tzname())

Have you tried

rfc822.parsedate_tz(date) # ?

More on RFC822, http://docs.python.org/library/rfc822.html

It s deprecated (parsedate_tz is now in email.utils.parsedate_tz), though.

But maybe these answers help:

# Parses Nginx  format of "01/Jan/1999:13:59:59 +0400"
# Unfortunately, strptime doesn t support %z for the UTC offset (despite what
# the docs actually say), hence the need # for this function.
def parseDate(dateStr):
    date = datetime.datetime.strptime(dateStr[:-6], "%d/%b/%Y:%H:%M:%S")
    offsetDir = dateStr[-5]
    offsetHours = int(dateStr[-4:-2])
    offsetMins = int(dateStr[-2:])
    if offsetDir == "-":
        offsetHours = -offsetHours
        offsetMins = -offsetMins
    return date + datetime.timedelta(hours=offsetHours, minutes=offsetMins)

For those who want to get the correct local time, here is what I did:

from datetime import datetime
from email.utils import parsedate_to_datetime

mail_time_str =  Mon, 16 Nov 2009 13:32:02 +0100 

local_time_str = datetime.fromtimestamp(parsedate_to_datetime(mail_time_str).timestamp()).strftime( %Y-%m-%d %H:%M:%S )

print(local_time_str)

ValueError: z is a bad directive in format...

(note: I have to stick to python 2.7 in my case)

I have had a similar problem parsing commit dates from the output of git log --date=iso8601 which actually isn t the ISO8601 format (hence the addition of --date=iso8601-strict in a later version).

Since I am using django I can leverage the utilities there.

https://github.com/django/django/blob/master/django/utils/dateparse.py

>>> from django.utils.dateparse import parse_datetime
>>> parse_datetime( 2013-07-23T15:10:59.342107+01:00 )
datetime.datetime(2013, 7, 23, 15, 10, 59, 342107, tzinfo=+0100)

Instead of strptime you could use your own regular expression.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签