English 中文(简体)
Python OLE2 date format conversion
原标题:
  • 时间:2009-11-30 03:57:52
  •  标签:
  • python
  • ole

I have created a python script which pulls data out of OLE streams in Word documents, but am having trouble converting the OLE2-formatted timestamp to something more human-readable :(

The timestamp which is pulled out is 12760233021 but I cannot for the life of me convert this to a date like 12 Mar 2007 or similar.

Any help is greatly appreciated.

EDIT: OK I have ran the script over one of my word documents, which was created on 31/10/2009, 10:05:00. The Create Date in the OLE DocumentSummaryInformation stream is 12901417500.

Another example is a word doc created on 27/10/2009, 15:33:00, gives the Create Date of 12901091580 in the OLE DocumentSummaryInformation stream.

The MSDN documentation on the properties of these OLE streams is http://msdn.microsoft.com/en-us/library/aa380376%28VS.85%29.aspx

The def which pulls these streams out is given below:

import OleFileIO_PL as ole

def enumerateStreams(item):
    # item is an arbitrary file
    if ole.isOleFile( %s  % item):
        loader = ole.OleFileIO( %s  % item)
        # enumerate all the OLE streams in the office file
        streams = loader.listdir()
        streamProps = []
        for stream in streams:
            if stream[0] ==  x05SummaryInformation :
                # get all the properties fro the SummaryInformation OLE stream
                streamProps.append(loader.getproperties(stream))
            elif stream[0] ==  x05DocumentSummaryInformation :
                # get all the properties from the DocumentSummaryInformation stream
                streamProps.append(loader.getproperties(stream))
     return streamProps
最佳回答

(0) Please clarify "like 12 Mar 2007 or similar": do you mean that you expect the 11-digit int to convert to 12 Mar 2007, or is "12 Mar 2007" merely intended to convey the format in which you want to display the date? If the latter, can t you provide expected results by inspecting some files with MS Word or OpenOffice.org s word processing gadget? How do you intend to verify that any solution that is offered actually works?

(1) Please give more than one (OLE, expected) pair so that correct operation of any proposed solution can be verified with more assurance. If possible, can you create examples with known expected values like 01 Jan 2000, 01 Jan 2001, 02 Jan 2001, 02 Feb 2001?

(2) It is not obvious from "pulls data out of OLE streams" whether you want the file creation etc timestamps in the OLE2 compound document header, or whether you want timestamps that are present in the content. Please say WHERE you are trawling for timestamps. It would also help tremendously if you could give a reference to the MS documentation that relates to the timestamps you are interested in ... surely it must tell you what the format is, even if it does so indirectly by one or two intra/extra-document hops.

(3) Please show HOW you are pulling that out -- is it a string? fixed 11 bytes? Or is it str(some int that you have converted from a 64-bit field)? Converted HOW?? As well as a description, show your conversion code. Don t retype your code from memory; use copy/paste.

Please provide the requested info by editing your question, not as comments.

Update while waiting for info:

The file creation and modification timestamps in an OLE compound document header appear to be 64-bit little-endian integers representing (seconds since 1601-01-01T00:00:00) * 10 ** 7.

The DATE type used in data in OLE2 data appears to be 64-bit little-endian IEEE 754 float representing (days and a fraction thereof) since 1899-12-30T00:00:00. Yes the day is 30, not 31.

Update after examining the 2 examples supplied:

The difference between the two observed timestamps (which will be in your local time) is 325920 seconds:

>>> import datetime
>>> t0 = datetime.datetime(2009,10,27,15,33,0)
>>> t1 = datetime.datetime(2009,10,31,10,5,0)
>>> t1-t0
datetime.timedelta(3, 66720)
>>> secs = 3 * 24 * 60 * 60 + 66720
>>> secs
325920

This is the same as the difference between the two magic numbers:

>>> 12901417500 - 1290191580
325920

So the magic numbers represent seconds since some epoch ...

>>> m1 = 12901417500
>>> days, seconds = divmod(m1, 60*60*24)
>>> epoch = t1 - datetime.timedelta(days, seconds)
>>> epoch
datetime.datetime(1601, 1, 1, 11, 0)

So the magic numbers represent seconds since 1601-01-01T00:00:00Z and your TZ is 11 hours away from UTC.

Those two magic numbers won t fit in 32 bits ... looks like either (a) it is stored in 64 bits as seconds since 1601 (a waste of about 29 bits!) or (b) it is stored as (number of 100-nanosecond units) since 1601 as expected but something is dividing it by 10**7 before you see it.

The documentation reference that you gave merely says that it s a VF_FILETIME (UTC) type. Googling that, I find a couple of MS clues on calling Windows functions to manipulate the timestamps, but no definition as far as I looked. However there are two 3rd party notes (from perlmonks and the Apache POI project) which say much the same thing: """This looks like a Windows VT_FILETIME data type which is a 64 bit unsigned integer representing the number of elapsed 100 nanoseconds since 1 January 1601"""

Update from the crime scene:

Seems you are using OleFileIO_PL to read the files. A quick rummage through the sole source file reveals this:

    elif type == VT_FILETIME:
        value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32)
        # FIXME: this is a 64-bit int: "number of 100ns periods
        # since Jan 1,1601".  Should map this to Python time
        value = value / 10000000L # seconds
问题回答

this question is quite old but still useful. I recently improved OleFileIO_PL to fix the issue by converting dates automatically to Python datetime.

See the documentation on this page, especially the parts about get_metadata and get_properties: https://bitbucket.org/decalage/olefileio_pl

When using get_metadata, all timestamps in standard property streams such as x05SummaryInformation are converted to Python datetime. If you need to use get_properties instead, then use the convert_time option:

p = ole.getproperties( specialprops , convert_time=True)

Philippe.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签