Question

The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty. E. Zeytinci wants to draw more attention to this question.

I have a data structure as follows,

data = {
     0_0 : [( 0_0 , 0), ( 0_1 , 1), ( 0_2 , 2)],
     0_1 : [( 0_0 , 1), ( 0_1 , 0), ( 0_2 , 1)],
     0_2 : [( 0_0 , 2), ( 0_1 , 1), ( 0_2 , 0)],
}

How to delete tuple from lists by given key? For example I wrote a function as follows,

def remove_all_acs(
    dictionary,
    delete_ac
):
    for key in dictionary:
        acs = dictionary[key]
        for ac in acs:
            if ac[0] == delete_ac:
                acs.remove(ac)
                break

    return dictionary

其结果如下:

print(remove_all_acs(data,  0_0 ))
{
     0_0 : [( 0_1 , 1), ( 0_2 , 2)],
     0_1 : [( 0_1 , 0), ( 0_2 , 1)],
     0_2 : [( 0_1 , 1), ( 0_2 , 0)]
}

It works but is not effective on large lists. Do you have any idea? Thanks in advance.

此外,你还可以使用该代码制作大量数据集。

def generate(size, max_connections):
    data = {}
    keys = [f {i}_{j}  for i in range(size) for j in range(size)]
    
    for key in keys:
        connections = random.randint(1, min(max_connections, 10))
        data[key] = [
            (random.choice(keys), random.randint(1, 10)) 
            for _ in range(connections)
        ]
    
    return data

data = generate(200, 10)

Answer 1

同你一样,你试图从图表中删除一个节点,以及所有相关的链接。

Unfortunately, without changing the datastructure, I don t see a faster algorithm.

删除可以人工操作,单独保存一个藏匿处,例如与删除项目有关的一组项目don t,并在删除时绕过这些项目。当然,在插入新的联系时,也必须对这些切身进行更新。

Just in case, this other SO question refers two good libraries to store graphs:

Answer 2

检索和删除tuple with ANY,其第二数值在一些测试中似乎比以前快:

from unittest.mock import ANY

def remove_all_acs(
    dictionary,
    delete_ac
):
    delete = delete_ac, ANY
    for acs in dictionary.values():
        if delete in acs:
            acs.remove(delete)
    return dictionary

Answer 3

我知道,在你提到的评论中,你无法改变数据结构,但我将在以下前提上把答案放在任何方面:

Someone else might come to this question in a position that they CAN decide and select a better data structure for a scenario like this;
Maybe yourself, after seeing the improvement in results with just a slight change in the data structure, could re-evaluate this constraint.

如果外部字典的价值是插图L的括号()内,那么你可以取得的最佳表现至少与L型号(找到匹配)成正比,而且可能再次与你在删除一个要素之后重新编号。你目前的解决办法几乎是好,可能无法执行一个比较复杂的算法(如增加切身),改善几 %。

从理论上讲,你可以做一些比较容易的事情是,在<条码>(超标)和改动<条码>上填入<条码>>。但是,即便如此,这实际上也可能不会更快,因为清单。拆除工作非常优化。

现在,我实际上建议把你的字典价值从<条码>[图[指示,]改为<条码>。在关键词中找到一个项目,与删除该项目一样,是一种持续的时间行动,比在清单中这样做要好得多。如果你有许多内在联系,而不是10个最大联系,那么这种区别就更加明显。

def generate(size, max_connections):
    data = {}
    keys = [f {i}_{j}  for i in range(size) for j in range(size)]
    
    for key in keys:
        # fixing a small bug, so `max_connections` is respected
        connections = random.randint(1, max_connections)
        # dict constructor, instead of list
        data[key] = {
            random.choice(keys): random.randint(1, 10)
            for _ in range(connections)
        }
    
    return data

def remove_all_acs(dictionary, delete_ac):
    for acs in dictionary:
        acs = dictionary[key]
        # `check for containment` and `delete` are FAST
        if delete_ac in acs:
            del acs[delete_ac]
    return dictionary

That would be the minimal intervention needed, and with just that, even with just 10 connections on each node, you should get 3x faster results.

见下文一些基准:

def test(data, n):
    times = []
    for _ in range(n):
        ac = f"{random.randint(0, len(data) - 1)}_{random.randint(0, len(data) - 1)}"
        start = time.time()
        remove_all_acs(data, ac)
        end = time.time()
        times.append(end - start)
    print(n, "runs, total time:", sum(times))

test(generate(200, 10), 1000)   # 10 connections
# Your original solution: 1000 runs, total time: 16.1 s
# Changing list to dict:  1000 runs, total time:  5.1 s

test(generate(200, 100), 1000)  # 100 connections
# Your original solution: 1000 runs, total time: 71.7 s
# Changing list to dict:  1000 runs, total time:  9.4 s

test(generate(200, 1000), 1000) # 1000 connections
# Your original solution: 1000 runs, total time: (I gave up on waiting)
# Changing list to dict:  1000 runs, total time: 14.1 s

Answer 4

def remove_all_acs_now(dictionary, delete_ac):
    for key in dictionary:
        acs = dictionary[key]
        for ac in acs:
            if ac[0] == delete_ac:
                acs.remove(ac)
                break

    return dictionary

def remove_all_acs(dictionary, delete_ac):
    for i in dictionary.values(): # .items() / .values() are the fastest ways for dicts, always!
        for q in (i):
            if q[0] == delete_ac:
                i.remove(q)
                break
            
data=generate(200, 10)
from copy import deepcopy
remove_all_acs(dictionary=deepcopy(data), delete_ac="0_0")
remove_all_acs_now(dictionary=deepcopy(data), delete_ac="0_0")

%timeit deepcopy(data) # needed, because we are modifying the original value
502 ms ± 1.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit remove_all_acs(dictionary=deepcopy(data), delete_ac="0_0")
539 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
539 - 502 = 37 ms

%timeit remove_all_acs_now(dictionary=deepcopy(data), delete_ac="0_0")
559 ms ± 44.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
559 - 502 = 57 ms

Further improvements: The function doesn t need to return anything, since you are modifying the original dict anyway. You wouldn t even need to pass the dict to the function if it is available in the function s scope (probably a tiny bit faster).

然而,正如我在评论中指出的那样:试图利用热带风暴或Numpy(或两者兼有)进行真正的(10x或更快的)改进。

Edit - Test with cython - 6 times faster

Cython is at least 6 times faster. The less there is to delete, the faster it is. The speed of Cython s loops is insane. If there is no matching value (which means nothing to delete), it is about 50 times faster.

from cythondicttest import deletefomdict
import random
from copy import deepcopy


def generate(size, max_connections):
    data = {}
    keys = [f"{i}_{j}" for i in range(size) for j in range(size)]

    for key in keys:
        connections = random.randint(1, min(max_connections, 10))
        data[key] = [
            (random.choice(keys), random.randint(1, 10)) for _ in range(connections)
        ]

    return data


def remove_all_acs_now(dictionary, delete_ac):
    for key in dictionary:
        acs = dictionary[key]
        for ac in acs:
            if ac[0] == delete_ac:
                acs.remove(ac)
                break

    return dictionary


data = generate(500, 100)

# %timeit deepcopy(data)
# 3.16 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# %timeit deletefomdict(dictionary=deepcopy(data), delete_ac="0_0")
# 3.2 s ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# CYTHON:
# 3200 ms - 3160 ms = 40 ms

# %timeit remove_all_acs_now(dictionary=deepcopy(data), delete_ac="0_0")
# 3.41 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 3410 ms - 3160 ms = 250 ms

做不到很多工作:

创建PYX文档cythondicttest.pyx,将其与Cython汇编成册,你准备去!

# important: use optimized compiler directives
# https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#compiler-directives
cimport cython
import cython

cpdef int deletefomdict(dict[str,list[tuple[str,int]]] dictionary, str delete_ac):
    cdef:
        Py_ssize_t loop
    for i in dictionary.values():
            loop=len(i)
            for qq in range(loop):
                if i[qq][0] == delete_ac:
                    del i[qq]
                    break
    return 0

如果你以前从未使用过气旋,你会把这些指令(我的违约环境)作为导向,但你会取得更好的结果,使这里和那里的环境遭到破坏。

optionsdict = {
    "Options.docstrings": False,
    "Options.embed_pos_in_docstring": False,
    "Options.generate_cleanup_code": False,
    "Options.clear_to_none": True,
    "Options.annotate": True,
    "Options.fast_fail": False,
    "Options.warning_errors": False,
    "Options.error_on_unknown_names": True,
    "Options.error_on_uninitialized": True,
    "Options.convert_range": True,
    "Options.cache_builtins": True,
    "Options.gcc_branch_hints": True,
    "Options.lookup_module_cpdef": False,
    "Options.embed": False,
    "Options.cimport_from_pyx": False,
    "Options.buffer_max_dims": 8,
    "Options.closure_freelist_size": 8,
}
configdict = {
    "define_macros": [
        ("NPY_NO_DEPRECATED_API", 1),
        ("NPY_1_7_API_VERSION", 1),
        ("CYTHON_USE_DICT_VERSIONS", 1),
        ("CYTHON_FAST_GIL", 1),
        ("CYTHON_USE_PYLIST_INTERNALS", 1),
        ("CYTHON_USE_UNICODE_INTERNALS", 1),
        ("CYTHON_ASSUME_SAFE_MACROS", 1),
        ("CYTHON_USE_TYPE_SLOTS", 1),
        ("CYTHON_USE_PYTYPE_LOOKUP", 1),
        ("CYTHON_USE_ASYNC_SLOTS", 1),
        ("CYTHON_USE_PYLONG_INTERNALS", 1),
        ("CYTHON_USE_UNICODE_WRITER", 1),
        ("CYTHON_UNPACK_METHODS", 1),
        ("CYTHON_USE_EXC_INFO_STACK", 1),
        ("CYTHON_ATOMICS", 1),
    ],
    "undef_macros": [],
    "library_dirs": [],
    "libraries": [],
    "runtime_library_dirs": [],
    "extra_objects": [],
    "extra_compile_args": ["/O2", "/Oy"],
    "extra_link_args": [],
    "export_symbols": [],
    "swig_opts": [],
    "depends": [],
    "language": "c",
    "optional": None,
}

compiler_directives = {
    "binding": True,
    "boundscheck": False,
    "wraparound": False,
    "initializedcheck": False,
    "nonecheck": False,
    "overflowcheck": False,
    "overflowcheck.fold": True,
    "embedsignature": False,
    "embedsignature.format": "c",  # (c / python / clinic)
    "cdivision": True,
    "cdivision_warnings": False,
    "cpow": True,
    "always_allow_keywords": False,
    "c_api_binop_methods": False,
    "profile": False,
    "linetrace": False,
    "infer_types": True,
    "language_level": 3,  # (2/3/3str)
    "c_string_type": "bytes",  # (bytes / str / unicode)
    "c_string_encoding": "default",  # (ascii, default, utf-8, etc.)
    "type_version_tag": False,
    "unraisable_tracebacks": True,
    "iterable_coroutine": True,
    "annotation_typing": True,
    "emit_code_comments": True,
    "cpp_locals": False,
    "legacy_implicit_noexcept": False,
    "optimize.use_switch": True,
    "optimize.unpack_method_calls": True,
    "warn.undeclared": False,  # (default False)
    "warn.unreachable": True,  # (default True)
    "warn.maybe_uninitialized": False,  # (default False)
    "warn.unused": False,  # (default False)
    "warn.unused_arg": False,  # (default False)
    "warn.unused_result": False,  # (default False)
    "warn.multiple_declarators": True,  # (default True)
    "show_performance_hints": True,  # (default True)
}

友情链接