English 中文(简体)
优化从列表中筛选/提取数据的逻辑的Python方法
原标题:pythonic way to optimize the logic to filter/extract data from list

我有一个如下的列表:

[ 1 (UID 3234 FLAGS (seen \Seen)) ,  2 (UID 3235 FLAGS (\Seen)) ,
  3 (UID 3236 FLAGS (\Deleted)) ,  4 (UID 3237 FLAGS (-FLAGS \Seen +FLAGS)) ,
  5 (UID 3241 FLAGS (-FLAGS \Seen +FLAGS)) ,  6 (UID 3242 FLAGS (\Seen)) , 
  7 (UID 3243 FLAGS (\Seen)) ,  8 (UID 3244 FLAGS (\Seen)) , 
  9 (UID 3245 FLAGS (\Seen)) ,  10 (UID 3247 FLAGS (\Seen)) , 
 11 (UID 3252 FLAGS (\Seen)) ,  12 (UID 3253 FLAGS (\Deleted)) , 
 13 (UID 3254 FLAGS ()) ,  14 (UID 3256 FLAGS (\Seen)) ,  15 (UID 3304 FLAGS ()) , 
 16 (UID 3318 FLAGS (\Seen)) ,  17 (UID 3430 FLAGS (\Seen)) , 
 18 (UID 3431 FLAGS ()) ,  19 (UID 3434 FLAGS (\Seen)) , 
 20 (UID 3447 FLAGS (-FLAGS \Seen +FLAGS)) ,  21 (UID 3478 FLAGS ()) , 
 22 (UID 3479 FLAGS ()) ,  23 (UID 3480 FLAGS ()) ,  24 (UID 3481 FLAGS ()) ]

从这个列表中,我想要三个不同的列表作为结果。我想要列表上使用单个迭代的结果。

  1. list of all uids i.e [3234,3235,3236,3237,3241 .....]
  2. list of Seen uids i.e [3234,3235 ...] <-- uid of item which has Seen Flag
  3. list of deleted uids i.e [3236,3253] <-- uid of item which has Deleted Flag
最佳回答

最好的做法是将您的数据转换为dict将UID映射到FLAGS,然后搜索它将很容易。因此,数据将如下所示:

{ 3254 :   ,  3304 :   ,  3236 :  \Deleted ,  3237 :  -FLAGS \Seen +FLAGS ,  3234 :  seen \Seen ,  3235 :  \Seen ,  3430 :  \Seen ,  3431 :   ,  3252 :  \Seen ,  3253 : \Deleted ,  3478 :   ,  3479 :   ,  3256 :  \Seen ,  3481 :   ,  3480 :   ,  3318 :  \Seen ,  3434 :  \Seen ,  3243 :  \Seen ,  3242 :  \Seen ,  3241 :  -FLAGS \Seen +FLAGS ,  3247 :  \Seen ,  3245 :  \Seen ,  3244 :  \Seen ,  3447 :  -FLAGS \Seen +FLAGS }

你可以这样做使用正则表达式来匹配列表中的每个条目。如果我们让正则表达式在匹配中返回两组,我们可以轻松地构建dict

所以我们最终得到了这样的结果:

items = [ 1 (UID 3234 FLAGS (seen \Seen)) ,  2 (UID 3235 FLAGS (\Seen)) ,  3 (UID 3236 FLAGS (\Deleted)) ,  4 (UID 3237 FLAGS (-FLAGS \Seen +FLAGS)) ,  5 (UID 3241 FLAGS (-FLAGS \Seen +FLAGS)) ,  6 (UID 3242 FLAGS (\Seen)) ,   7 (UID 3243 FLAGS (\Seen)) ,  8 (UID 3244 FLAGS (\Seen)) ,   9 (UID 3245 FLAGS (\Seen)) ,  10 (UID 3247 FLAGS (\Seen)) ,  11 (UID 3252 FLAGS (\Seen)) ,  12 (UID 3253 FLAGS (\Deleted)) ,  13 (UID 3254 FLAGS ()) ,  14 (UID 3256 FLAGS (\Seen)) ,  15 (UID 3304 FLAGS ()) ,  16 (UID 3318 FLAGS (\Seen)) ,  17 (UID 3430 FLAGS (\Seen)) ,  18 (UID 3431 FLAGS ()) ,  19 (UID 3434 FLAGS (\Seen)) ,  20 (UID 3447 FLAGS (-FLAGS \Seen +FLAGS)) ,  21 (UID 3478 FLAGS ()) ,  22 (UID 3479 FLAGS ()) ,  23 (UID 3480 FLAGS ()) ,  24 (UID 3481 FLAGS ()) ]

import re
pattern = re.compile(r"d+ (UID (d+) FLAGS (([^)]*)))")
values = dict(pattern.match(item).groups() for item in items)

然后,我们可以轻松地查询中的项,以获得您想要的内容:

print "All UIDs:",values.keys()
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"Seen" in flags]
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"Deleted" in flags]
问题回答
import re

data = [ 1 (UID 3234 FLAGS (seen \Seen)) ,  2 (UID 3235 FLAGS (\Seen)) ,
  3 (UID 3236 FLAGS (\Deleted)) ,  4 (UID 3237 FLAGS (-FLAGS \Seen +FLAGS)) ,
  5 (UID 3241 FLAGS (-FLAGS \Seen +FLAGS)) ,  6 (UID 3242 FLAGS (\Seen)) , 
  7 (UID 3243 FLAGS (\Seen)) ,  8 (UID 3244 FLAGS (\Seen)) , 
  9 (UID 3245 FLAGS (\Seen)) ,  10 (UID 3247 FLAGS (\Seen)) , 
 11 (UID 3252 FLAGS (\Seen)) ,  12 (UID 3253 FLAGS (\Deleted)) , 
 13 (UID 3254 FLAGS ()) ,  14 (UID 3256 FLAGS (\Seen)) ,  15 (UID 3304 FLAGS ()) , 
 16 (UID 3318 FLAGS (\Seen)) ,  17 (UID 3430 FLAGS (\Seen)) , 
 18 (UID 3431 FLAGS ()) ,  19 (UID 3434 FLAGS (\Seen)) , 
 20 (UID 3447 FLAGS (-FLAGS \Seen +FLAGS)) ,  21 (UID 3478 FLAGS ()) , 
 22 (UID 3479 FLAGS ()) ,  23 (UID 3480 FLAGS ()) ,  24 (UID 3481 FLAGS ()) ]

r = re.compile( d+s(UIDs(?P<uid>d+)sFLAGSs((?P<data>.*))) )
uid_list = []
seen_uid_list = []
deleted_uid_list = []
for s in data:
    m = r.match(s)
    if m:
        uid_list.append(m.group( uid ))
        if m.group( data ).rfind( Seen ) > 0: seen_uid_list.append(m.group( uid ))
        if m.group( data ).rfind( Deleted ) > 0: deleted_uid_list.append(m.group( uid ))

print uid_list
print seen_uid_list
print deleted_uid_list

我不确定列表理解,因为它们通常将一个列表映射到另一个列表(使用过滤或映射)。我还没见过它们被用来拆分列表。然而,您可以在一次迭代中使用genexp和循环的组合来实现这一点。我把这个放大了一点,这样就清楚了。

import re
grepper = re.compile(r [0-9]+ (UID (?P<uid>[0-9]+) FLAGS (?P<flags>(.*))) )

t = [..] #your list

items = (grepper.search(m).groupdict() for m in t)

all = []
seen = []
deleted = []
for i in items:
  if "Seen" in i:
    seen.append(i["uid"])
  if "Deleted" in i:
    deleted.append(i["uid"])
  all.append(i["uid"])

你现在应该有你的3个清单。

all,deleted,seen = [list(filter(None, a)) for a in 
    zip(*map(lambda a: (a[2],  Deleted  in a[-1] and a[2],  Seen  in  a[-1] and a[2]), map(lambda a: a.split(   ), items)))]

使用re或不使用re都会更快-您需要检查timeit!!!

这个适用于您的数据示例。。。。

uids, seen, deleted = [], [], []
for item in myList:
    uids.append(int(item[7:12]))
    if  Se  in item[20:]:  seen.append(uids[-1])
    elif  De  in item[20:]: deleted.append(uids[-1])
all=[]
seen=[]
deleted=[]
for item in alist:
    s=item.split(" ",4)
    all.append(s[2])
    if "seen" in s[-1].lower():
        seen.append(s[2])
    elif "delete" in s[-1].lower():
        deleted.append(s[2])

我能想到的在一次迭代中生成您所要求的三个列表的唯一方法是手动迭代。我想不出什么蟒蛇魔法。

如果你知道格式的细节以及它是如何生成的,你可以很容易地改进它。例如,我不知道为什么在某些项目中使用+FLAGS和-FLAGS,也不知道什么时候应该使用括号,所以我不得不使用find()。此外,我本可以将字符串一分为二,但话说回来,我不知道标志格式是什么意思,。。。

def parseList(l):
    lall = []
    lseen = []
    ldeleted = []

    for item in l:
        spl = item.split()

        uid = int(spl[2])

        lall.append(uid)

        for word in spl[4:]:
            if word.find("Seen") != -1:
                lseen.append(uid)

            elif word.find("Deleted") != -1:
                ldeleted.append(uid)

    return lall, lseen, ldeleted




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签