English 中文(简体)
通过大字典在座
原标题:Iterate through large dictionary in python

我写了这一职能,以便根据具体逻辑合并税收价值。 它避开税词典,寻找共同国家法典的钥匙,并具有重叠的价值。 当发现这些钥匙时,其价值被合并,重复钥匙从字典中删除。

def merge_tax_values_new_logic(tax_dict):
    treated_list = set()
    while True:
        changed = False
        for key1, value1 in list(tax_dict.items()):
            country_code = key1[-2:]
            print( current list : ,tax_dict)
            if key1 not in treated_list:
                print( current iteration key :  , key1) 
                for key2, value2 in list(tax_dict.items()):
                    if key2.endswith(country_code) and key1 != key2 and any(hl_id in value2 for hl_id in value1):
                        tax_dict[key1].extend(value2)
                        tax_dict.pop(key2)
                        tax_dict[key1] = list(set(tax_dict[key1]))
                        changed = True
                        print(  current key :   , key1 ,  matched  with key :   , key2  ,   state  of the dict after the pop :  , tax_dict)
                        break
            treated_list.add(key1)
            print( treated list : , treated_list)
            print( ****************************** )
            if changed:
                break
        if not changed:
            break
    return tax_dict

例:

new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ]}
merge_tax_values_new_logic(new_tax_dict)

结果:

    current list : { tax1_US : [ A ],  tax2_US : [ B ],  tax3_US : [ A ,  B ]}
    current iteration key : tax1_US
    current key :  tax1_US matched  with key :  tax3_US state  of the dict after the pop :  { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    treated list : { tax1_US }
    ******************************
    current list : { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    treated list : { tax1_US }
    ******************************
    current list : { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    current iteration key : tax2_US
    current key :  tax2_US matched  with key :  tax1_US state  of the dict after the pop :  { tax2_US : [ A ,  B ]}
    treated list : { tax2_US ,  tax1_US }
    ******************************
    current list : { tax2_US : [ A ,  B ]}
    treated list : { tax2_US ,  tax1_US }
    ******************************
    { tax2_US : [ A ,  B ]}

它完全依靠少数关键的小dict。 然而,当这一职能涉及法令内大量关键人物(+40k钥匙和每个关键点5个要素的平均值)时,业绩确实是一个问题。

你们是否看到其他选择?

关于

最佳回答

你们想要的是图表的相关内容。 图书馆<代码>networkx 这样做很容易。 我试图衡量速度,但抽样数据太小,无法取得有益的结果。 我只是猜测它比原办法更快:

import networkx as nx
from itertools import repeat

def merge_tax_values_new_logic(tax_dict):
    graph = nx.Graph()

    representation = {}

    for key1, value1 in tax_dict.items():
        country_code = key1[-2:]
        nodes = list (zip(value1, repeat(country_code)))
        for n in nodes:
            representation[n] = key1
        graph.add_nodes_from(nodes)

        if len(value1) > 1:
            first = (value1[0], country_code)
            graph.add_edges_from(zip(repeat(first), zip(value1[1:], repeat(country_code))))

    result = {}
    
    for comps in nx.connected_components(graph):
        representer = representation[next(iter(comps))]
        result[representer] = [value for value, _ in comps]
        
    print(result)


new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ]}
merge_tax_values_new_logic(new_tax_dict)
        
new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ] ,  tax4_US :[ Z ] }
merge_tax_values_new_logic(new_tax_dict)

产出不是理想的结果,但我认为是正确的:

{ tax3_US : [ B ,  A ]}
{ tax3_US : [ B ,  A ],  tax4_US : [ Z ]}
问题回答

如果你想有效地处理ging合物,将国家法典作为关键词,以便它们以O(1)的方式将同样的位置划上。

如果你想要有效地把一组名单合并起来,而是尝试使用一套清单。 它支持有效的工会行动。

我不想只是出于任何原因放弃钥匙,而是会建立一个国家数据库,以记住一切。 这样,我就能够用一个图表绘制国家地图,以输入该国的所有数据。

from collections import defaultdict


def country_code(key: str) -> str:
    """ get the country code for a key """
    return key[-2:]


class CountryData:
    """ efficiently merge country data """

    def __init__(self, key=None, values=()):
        self.keys = [] if key is None else [key, ]
        self.values = set(values)

    def __add__(self, other):
        self.keys.extend(other.keys)
        self.values |= other.values


def merge_tax_values_new_logic(tax_dict):
    """ accumulate country data """
    country_data = defaultdict(CountryData)
    for key, values in tax_dict.items():
        temp = country_data[country_code(key)]
        temp += CountryData(key, values)
    return country_data


tax_dict = { tax1_US : [ A ],  tax2_US : [ B ],  tax3_US : [ A ,  B ]}
country_data = merge_tax_values_new_logic(tax_dict)
for country, data in country_data.items():
    print("country", country)
    print("    keys:", data.keys)
    print("    values:", list(data.values))

country US
keys: [ tax1_US , tax2_US , tax3_US ]
values: [ B , A ]

如果你想要的话,我可以重新做这件事。 该类别的一些细节只是要使其更好地工作,而不必报案。 但这里的一般想法是,利用《国家法典》作为关键词,收集每个国家的所有信息。

我理解你的挫折感,选择<条码>至。 漏洞可能非常复杂。

你有一个非常常见的问题,即你通过一份清单,另外有一个封顶的<><<>条/代码>编码,正在通过该编码重新编号。 在这类问题中,执行速度可能达到“n^2的复杂性”(n——字典中的关键数)。 这里最常见的解决办法是将钥匙分类,以排除通过同一清单公布的<><>条码>。

<>Solutioin:

I would recommend to get sorted list of tax keys, basically sorted by last two symbols. keys = sorted(data.keys(), key = lambda x: x[-2:])

然后,你可能会通过所有关键点,对所有关键点进行分类。


def merge_tax_values_new_logic(tax_dict):

    def get_country(key: str) -> str:
        return key[-2:]
    
    # sorting keys and thus grouping keys by country code
    keys = sorted(tax_dict.keys(), key = lambda x: get_country(x))

    # maps: value -> key
    vals_map = {}
    group_country = None
    for key in keys:

        # start processing new country group
        if get_country(key) != group_country:
            group_country = get_country(key)
            vals_map = {}


        for val in tax_dict[key]:
            # if we seen one of the values before merge old tax_code to current
            prev_key = vals_map.get(val, None)
            if prev_key in tax_dict and key != prev_key:
                # found processed tax code with the same value
                # merge tax code to current
                tax_dict[key].extend(tax_dict.pop(prev_key))
        
        # depends on the data you might want to deduplicate tax_dict[key] somewhere here

        # update information about seen values
        for val in tax_dict[key]:            
            vals_map[val] = key

        
    return tax_dict




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签