Question

我写了这一职能,以便根据具体逻辑合并税收价值。它避开税词典,寻找共同国家法典的钥匙,并具有重叠的价值。当发现这些钥匙时,其价值被合并,重复钥匙从字典中删除。

def merge_tax_values_new_logic(tax_dict):
    treated_list = set()
    while True:
        changed = False
        for key1, value1 in list(tax_dict.items()):
            country_code = key1[-2:]
            print( current list : ,tax_dict)
            if key1 not in treated_list:
                print( current iteration key :  , key1) 
                for key2, value2 in list(tax_dict.items()):
                    if key2.endswith(country_code) and key1 != key2 and any(hl_id in value2 for hl_id in value1):
                        tax_dict[key1].extend(value2)
                        tax_dict.pop(key2)
                        tax_dict[key1] = list(set(tax_dict[key1]))
                        changed = True
                        print(  current key :   , key1 ,  matched  with key :   , key2  ,   state  of the dict after the pop :  , tax_dict)
                        break
            treated_list.add(key1)
            print( treated list : , treated_list)
            print( ****************************** )
            if changed:
                break
        if not changed:
            break
    return tax_dict

例:

new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ]}
merge_tax_values_new_logic(new_tax_dict)

结果:

    current list : { tax1_US : [ A ],  tax2_US : [ B ],  tax3_US : [ A ,  B ]}
    current iteration key : tax1_US
    current key :  tax1_US matched  with key :  tax3_US state  of the dict after the pop :  { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    treated list : { tax1_US }
    ******************************
    current list : { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    treated list : { tax1_US }
    ******************************
    current list : { tax1_US : [ A ,  B ],  tax2_US : [ B ]}
    current iteration key : tax2_US
    current key :  tax2_US matched  with key :  tax1_US state  of the dict after the pop :  { tax2_US : [ A ,  B ]}
    treated list : { tax2_US ,  tax1_US }
    ******************************
    current list : { tax2_US : [ A ,  B ]}
    treated list : { tax2_US ,  tax1_US }
    ******************************
    { tax2_US : [ A ,  B ]}

它完全依靠少数关键的小dict。然而,当这一职能涉及法令内大量关键人物(+40k钥匙和每个关键点5个要素的平均值)时,业绩确实是一个问题。

你们是否看到其他选择?

关于

Answer 1

你们想要的是图表的相关内容。图书馆<代码>networkx 这样做很容易。我试图衡量速度,但抽样数据太小,无法取得有益的结果。我只是猜测它比原办法更快:

import networkx as nx
from itertools import repeat

def merge_tax_values_new_logic(tax_dict):
    graph = nx.Graph()

    representation = {}

    for key1, value1 in tax_dict.items():
        country_code = key1[-2:]
        nodes = list (zip(value1, repeat(country_code)))
        for n in nodes:
            representation[n] = key1
        graph.add_nodes_from(nodes)

        if len(value1) > 1:
            first = (value1[0], country_code)
            graph.add_edges_from(zip(repeat(first), zip(value1[1:], repeat(country_code))))

    result = {}
    
    for comps in nx.connected_components(graph):
        representer = representation[next(iter(comps))]
        result[representer] = [value for value, _ in comps]
        
    print(result)


new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ]}
merge_tax_values_new_logic(new_tax_dict)
        
new_tax_dict = { tax1_US :[ A ], tax2_US :[ B ],  tax3_US :[ A , B ] ,  tax4_US :[ Z ] }
merge_tax_values_new_logic(new_tax_dict)

产出不是理想的结果,但我认为是正确的:

{ tax3_US : [ B ,  A ]}
{ tax3_US : [ B ,  A ],  tax4_US : [ Z ]}

Answer 2

如果你想有效地处理ging合物,将国家法典作为关键词,以便它们以O(1)的方式将同样的位置划上。

如果你想要有效地把一组名单合并起来,而是尝试使用一套清单。它支持有效的工会行动。

我不想只是出于任何原因放弃钥匙,而是会建立一个国家数据库,以记住一切。这样,我就能够用一个图表绘制国家地图,以输入该国的所有数据。

from collections import defaultdict


def country_code(key: str) -> str:
    """ get the country code for a key """
    return key[-2:]


class CountryData:
    """ efficiently merge country data """

    def __init__(self, key=None, values=()):
        self.keys = [] if key is None else [key, ]
        self.values = set(values)

    def __add__(self, other):
        self.keys.extend(other.keys)
        self.values |= other.values


def merge_tax_values_new_logic(tax_dict):
    """ accumulate country data """
    country_data = defaultdict(CountryData)
    for key, values in tax_dict.items():
        temp = country_data[country_code(key)]
        temp += CountryData(key, values)
    return country_data


tax_dict = { tax1_US : [ A ],  tax2_US : [ B ],  tax3_US : [ A ,  B ]}
country_data = merge_tax_values_new_logic(tax_dict)
for country, data in country_data.items():
    print("country", country)
    print("    keys:", data.keys)
    print("    values:", list(data.values))

country US
keys: [ tax1_US , tax2_US , tax3_US ]
values: [ B , A ]

如果你想要的话,我可以重新做这件事。该类别的一些细节只是要使其更好地工作,而不必报案。但这里的一般想法是,利用《国家法典》作为关键词,收集每个国家的所有信息。

Answer 3

我理解你的挫折感,选择<条码>至。漏洞可能非常复杂。

你有一个非常常见的问题,即你通过一份清单,另外有一个封顶的<><<>条/代码>编码,正在通过该编码重新编号。在这类问题中,执行速度可能达到“n^2的复杂性”(n——字典中的关键数)。这里最常见的解决办法是将钥匙分类,以排除通过同一清单公布的<><>条码>。

<>Solutioin:

I would recommend to get sorted list of tax keys, basically sorted by last two symbols. keys = sorted(data.keys(), key = lambda x: x[-2:])

然后,你可能会通过所有关键点,对所有关键点进行分类。


def merge_tax_values_new_logic(tax_dict):

    def get_country(key: str) -> str:
        return key[-2:]
    
    # sorting keys and thus grouping keys by country code
    keys = sorted(tax_dict.keys(), key = lambda x: get_country(x))

    # maps: value -> key
    vals_map = {}
    group_country = None
    for key in keys:

        # start processing new country group
        if get_country(key) != group_country:
            group_country = get_country(key)
            vals_map = {}


        for val in tax_dict[key]:
            # if we seen one of the values before merge old tax_code to current
            prev_key = vals_map.get(val, None)
            if prev_key in tax_dict and key != prev_key:
                # found processed tax code with the same value
                # merge tax code to current
                tax_dict[key].extend(tax_dict.pop(prev_key))
        
        # depends on the data you might want to deduplicate tax_dict[key] somewhere here

        # update information about seen values
        for val in tax_dict[key]:            
            vals_map[val] = key

        
    return tax_dict

友情链接