English 中文(简体)
How to merge duplicates in 2D python arrays
原标题:

I have a set of data similar to this:

#  Start_Time    End_Time      Call_Type  Info 
1  13:14:37.236  13:14:53.700  Ping1      RTT(Avr):160ms
2  13:14:58.955  13:15:29.984  Ping2      RTT(Avr):40ms
3  13:19:12.754  13:19:14.757  Ping3_1    RTT(Avr):620ms
3  13:19:12.754                Ping3_2    RTT(Avr):210ms
4  13:14:58.955  13:15:29.984  Ping4      RTT(Avr):360ms
5  13:19:12.754  13:19:14.757  Ping1      RTT(Avr):40ms
6  13:19:59.862  13:20:01.522  Ping2      RTT(Avr):163ms
...

When I parse through it, I need to merge the results of Ping3_1 and Ping3_2, take the average of those two rows, and export that as one row, so the end of result would be like this:

#  Start_Time    End_Time      Call_Type  Info 
1  13:14:37.236  13:14:53.700  Ping1      RTT(Avr):160ms
2  13:14:58.955  13:15:29.984  Ping2      RTT(Avr):40ms
3  13:19:12.754  13:19:14.757  Ping3      RTT(Avr):415ms
4  13:14:58.955  13:15:29.984  Ping4      RTT(Avr):360ms
5  13:19:12.754  13:19:14.757  Ping1      RTT(Avr):40ms
6  13:19:59.862  13:20:01.522  Ping2      RTT(Avr):163ms
...

Currently, I am concatenating columns 0 and 1 to make a unique key, finding the duplication there, then doing the rest of the special treatment for those parallel pings. It is not elegant at all. Just wonder what is the better way to do it. Thanks!

问题回答

Assuming your duplicates are adjacent (as they re shown on your question), itertools.groupby is the ideal way to identify them as duplicates (with a little help from operator.itemgetter to extract the "key" defining identity. Assuming you have a list of objects (the pings) with attributes such as .start and .end:

import itertools
import operator

def merge(listofpings):
  k = operator.itemgetter( start ,  end )
  for i, grp in itertools.groupby(listofpings, key=k):
    lst = list(grp)
    if len(lst) > 2:
      item = mergepings(lst)
    else:
      item = lst[0]
    emitping(i, item)

assuming you already have functions mergepings to merge a list of > 1 "duplicate" pings, and emitping to emit a numbered ping (bare or merged).

If listofpings is not already properly sorted, just add listofpings.sort(key=k) just before the for loop (presumably emitting in sorted order is OK, right?).

Assuming the duplicates are adjacent you can use a generator like this. I guess you already have some code to average the pings

def average_pings(ping1, ping2):
    pass

def merge_pings(seq):
    prev_key=prev_key=None
    for item in seq:
        key = item.split()[:2]
        if key == prev_key:
            yield average_pings(prev_item, item)
        else:
            yield item
        prev_key=key
        prev_item=item

I m not sure on how your data is structured, so I ll assume a list of dicts for duck typing purposes.

I m also assuming the real primary key of your dataset is Start.

for i in range(len(dataset)-1):
  #Detect duplicates, assuming they are sorted properly
  if dataset[i]["Start"] == dataset[i+1]["Start"]:
    #Merge  em
    dataset[i+1] = merge(dataset[i], dataset[i+1])

    #Deleting items from the array you are iterating over is a bad idea
    dataset[i] = None

dataset = [item for item in dataset if item != None] #so just delete them later

...where merge would be the function that actually does the merging.

Not elegant, C-ish, but probably better than what you are currently using.

They re not sorted?

dataset.sort( (lambda x,y: return cmp(x["Start"],y["Start"])) )

Now they should be.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签