Question

I am investigating a list of funding bodies, and I need to sort out which are United States federal funders and keep them. I am using the ROR ID of each funder to identify it and the ROR REST API to return details about each.

我的问题是,如何起草一项职能,以重新调查每部<代码>、一个供资人的关系和每部<代码>即即供资者,等等,在链条到达我能看到顶级是“美利坚合众国政府”之前,如何查找链条的所有途径()。不适用。

例如,请看Delaware空间赠款方案(API response。这是从最高级的USGov t中删除的两级,我可以通过人工点击: Delaware -> USA -> US Govt.

The recursive function should try the first parent relationship it finds (in this case, NASA), look that ROR ID up in the API (https://api.ror.org/organizations/027ka1x80), see if it has parents, if it does, investigate those, and so on. If it turns out this did not provide a path up to USGovt, come back and try the next parent (so in this example, if NASA turned out to be a dead end, come back and try a path through the University of Delaware).

我正试图通过这一办法来考虑这一问题,但忽略了可能设立的部门的数量。每当方案选择一名家长进行调查时,该方案还将从另一个可以替代的资助者那里与其他父母接触。

我如何确保它彻底研究所有可能的分支部门? 我怎么能够追踪我们重新研究哪位自下而上的资助者(例如美国退休人员协会诉美利坚合众国)并回来尝试下一个基金?

import requests
url =  https://api.ror.org/organizations/010jszw77 
api_response = requests.get(url)
parsed_response = api_response.json()

num_relationships = len(parsed_response[ relationships ])  # 2 relationships in this example, both parents
print(f"Found {num_relationships} relationships")

for i in range(0, num_relationships):
    if(parsed_response[ relationships ][i][ type ]) == "Parent":
        print("parent")
        if parsed_response[ relationships ][i][ label ] = "Government of the United States of America":
            USFF = True
        else:
            run_up_chain_to_USGovt(parsed_response[ relationships ][i][ id ])
            #kick off recursive run up the chain
    else:
        1 #not a parent, do nothing


def run_up_chain_to_USGovt(rorid):
    
    # get passed a ROR ID
    # see if it has a parent of USGovt
    # if not, investigate a parent and so on
    
    local_url =  https://api.ror.org/organizations/  + rorid
    local_api_response = requests.get(local_url)
    local_parsed_response = local_api_response.json()
    
    # local_parsed_response[ relationships ] is a new list. Have to keep track of which index we re on
    # if the first parent isn t USGovt, try the id of that one. Call run_up_chain_to_USGovt again

    #eventually, if no more parents
        #if local_parsed_response[ id ] == https://ror.org/02rcrvv70   # USGovt
           # Found a path to top level USGovt
        #else:
           # No path to USGovt 
           # Go back to the start and try a different parent
    
    return

Answer 1

你说,你“对可能设立的分支机构数量之多感到过重”,并询问如何“确保它彻底检查所有可能的分支”。

这完全是提出休养解决办法的。它有助于通过使用相对简短和简单编码的非常庞大的收集系统进行搜索,如果你写好的话,最终将涵盖所有这些收集。

也有 down,这在一定程度上解释了为什么你想要在这里进行深入的首例搜查,以避免首先下载许多组织的信息,而是尽快检查解决办法,因为你们都只想知道父母是否拥有美国人。作为父母,而不是所有父母,或者人数。

在你的具体情况下,另一个缩小的方面可能是,许多组织可能有相同的母子,但你的做法将每次下载数据。因此,你可能想记住你所下载的任何东西,如果你第二次进入,那就会出现这种情况(它获得了美国电力公司)。否则,你就已经做过。

这也有助于避免在你试图改进的法典中出现错误。你的法典有一字:

        # needs ==, not =
        if parsed_response[ relationships ][i][ label ] == "Government of the United States of America":

你们的法典秩序意味着,在界定你的职能之前,你要重新指定你的职能。

最后,你期望实现以下目标尚不清楚:

        1  # not a parent, do nothing

无,使用<条码>。

这些问题是:

import requests

url =  https://api.ror.org/organizations/010jszw77 
api_response = requests.get(url)
parsed_response = api_response.json()

num_relationships = len(parsed_response[ relationships ])  # 2 relationships in this example, both parents
print(f"Found {num_relationships} relationships")


def run_up_chain_to_USGovt(rorid):
    local_url =  https://api.ror.org/organizations/  + rorid
    local_api_response = requests.get(local_url)
    local_parsed_response = local_api_response.json()

    return


for i in range(0, num_relationships):
    if (parsed_response[ relationships ][i][ type ]) == "Parent":
        print("parent")
        if parsed_response[ relationships ][i][ label ] == "Government of the United States of America":
            USFF = True
        else:
            run_up_chain_to_USGovt(parsed_response[ relationships ][i][ id ])
    else:
        pass

还要指出的另一个问题是:parsed_response[ relations ][i][id]实际上回归了URL。转而来的是,A/62/L.A.A.A.C.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.R.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A.A. 但是,在我看来,<编码>https://api.ror.org/Organizations/027ka1x80。打算这样做。

您真的希望你发挥 rec的作用:

load the data for an organisation
for each of its relationships
- check if it is a parent
  - if so, check if it is the US govt.
    - if so, we re done - the organisation has the US govt. as a parent, return True
    - if not, call the recursive function to first check all of its parents
    - if the result is True, this organisation has it as a grandparent, return True
if we didn t return True already, none were the US govt. or had it as a (grand)parent, return False

我们应牢记,不要“召回性功能”,而是“如果我们事先对该组织进行核查,我们应“召回性功能”,它就没有美国议会。 a 祖父母。只有当我们从任何地方收回本组织在返回<代码>False之前的身份证时,我们才会这样做,记住不要再次检查。

在你最初的法典中,你把一个组织的细节放在职能内外两倍,但你只能把休养职能称作原始组织身份证并核对结果。

所有这一切:

import requests
from requests import Response


def has_US_govt_parent(org_id: str, visited: list[str] = None) -> bool:
    """
    Returns True if the organization with the given ROR ID has a parent organization
    that is the Government of the United States of America, False otherwise.
    """
    # start with an empty cache
    if visited is None:
        visited = []

    # - load the data for an organisation
    print(f Loading data for {org_id} )
    url = f https://api.ror.org/organizations/{org_id} 
    api_response: Response = requests.get(url)
    parsed_response = api_response.json()

    # - for each of its relationships
    # you don t need the number of relationships and index it, you can just loop over the members
    for relationship in parsed_response[ relationships ]:
        # - check if it is a parent
        if relationship[ type ] == "Parent":
            # -if so, check if it is the US government
            if relationship[ label ] == "Government of the United States of America":
                return True
            else:
                # assuming that the last part without slashes is the actual ID, there s many ways to do this
                parent_id = relationship[ id ].split( / )[-1]
                # - call the recursive function if we haven t checked this organisation before, otherwise no
                if parent_id in visited:
                    grandparent_US_govt = False
                else:
                    print(f Need to check parents of {relationship["label"]} )
                    grandparent_US_govt = has_US_govt_parent(parent_id, visited)
                # - if the result is True, this organisation has it as a grandparent, return True
                if grandparent_US_govt:
                    return True
        else:
            pass

    # remember we already visited this one, it will always be False
    visited.append(org_id)

    # - if we didn t return True already, none were the US govt. or had it as a (grand)parent, return False
    return False


if has_US_govt_parent( 010jszw77 ):
    print( 010jszw77 has a US government parent )
else:
    print( 010jszw77 does not have a US government parent )

产出:

Loading data for 010jszw77
Need to check parents of National Aeronautics and Space Administration
Loading data for 027ka1x80
010jszw77 has a US government parent

请注意,如果你要经常进行这种搜查,你可能希望建立一个快速的盘问表,并定期更新,而不是每次都发出所有这种预报。或者,如果你从单一方案进行许多此类搜索,则至少要保留<代码>>>,以便随后的搜索速度大大加快。

在职能之外保持<条码>查询的轨道,并更确切地加以利用,以检查本组织是否已经访问:

import requests
from requests import Response


def has_US_govt_parent(org_id: str, visited: dict[str, bool] = None) -> bool:
    """
    Returns True if the organization with the given ROR ID has a parent organization
    that is the Government of the United States of America, False otherwise.
    """
    # start with an empty cache
    if visited is None:
        visited = {}

    # just check it here, and we re done
    if org_id in visited:
        return visited[org_id]

    # - load the data for an organisation
    print(f Loading data for {org_id} )
    url = f https://api.ror.org/organizations/{org_id} 
    api_response: Response = requests.get(url)
    parsed_response = api_response.json()

    # assume we won t find the US govt.
    result = False

    # - for each of its relationships
    # you don t need the number of relationships and index it, you can just loop over the members
    for relationship in parsed_response[ relationships ]:
        # - check if it is a parent
        if relationship[ type ] == "Parent":
            # -if so, check if it is the US government
            if relationship[ label ] == "Government of the United States of America":
                result = True
                break
            else:
                # assuming that the last part without slashes is the actual ID, there s many ways to do this
                parent_id = relationship[ id ].split( / )[-1]
                # - call the recursive function
                print(f Need to check parents of {relationship["label"]} )
                if has_US_govt_parent(parent_id, visited):
                    result = True
                    break
        else:
            pass

    # remember we already visited this one, it will always be False
    visited[org_id] = result

    # - if we didn t return True already, none were the US govt. or had it as a (grand)parent, return False
    return result


visited = {}

if has_US_govt_parent( 010jszw77 , visited):
    print( 010jszw77 has a US government parent )
else:
    print( 010jszw77 does not have a US government parent )

print( Data for these organisations were retrieved:  , visited)

这不仅有助于跟踪已经返回的各组织:<代码>False,而且只是记住在字典中产生的所有结果,并且将这种藏匿点用于连续电话。

现在,你还可以:

visited = {}
orgs = [ 01s3dpm97 ,  010jszw77 ,  028b18z22 ]
for org in orgs:
    print(f Checking {org} )
    if has_US_govt_parent(org, visited):
        print(f {org} has a US government parent )
    else:
        print(f {org} does not have a US government parent )
print( Visited:  , visited)

产出:

Checking 01s3dpm97
Loading data for 01s3dpm97
Need to check parents of University of Alabama in Huntsville
Loading data for 02zsxwr40
Need to check parents of University of Alabama System
Loading data for 051fvmk98
01s3dpm97 does not have a US government parent
Checking 010jszw77
Loading data for 010jszw77
Need to check parents of National Aeronautics and Space Administration
Loading data for 027ka1x80
010jszw77 has a US government parent
Checking 028b18z22
Loading data for 028b18z22
Need to check parents of National Aeronautics and Space Administration
028b18z22 has a US government parent
Visited:  { 051fvmk98 : False,  02zsxwr40 : False,  01s3dpm97 : False,  027ka1x80 : True,  010jszw77 : True,  028b18z22 : True}

Answer 2

我的算法就是这样做的:

I treat the organizations as node in a tree, each node has parent-, sibling-, or related nodes
Each node has a node ID, e.g. 010jszw77
The meat of this algorithm is in function has_usgov_as_parent(). Given a node ID, it will return true if the node itself or any parent up the tree is a US gov node.
I believe I put enough comments in has_usgov_as_parent() to make the code easy to understand.

这里的样本是一线 no子的 sample子:

Testing node Delaware Space Grant Consortium, id=010jszw77
Testing node National Aeronautics and Space Administration, id=027ka1x80
Direct parent of National Aeronautics and Space Administration is US Gov
Up-the-chain parent of Delaware Space Grant Consortium is US Gov
Found it now

此处是另一个没有发现的样本:

Testing node Alfred I. duPont Hospital for Children, id=00jyx0v10
Testing node Nemours Children s Health System, id=01mzw6m29
Does not have US Gov as parent: Nemours Children s Health System
Does not have US Gov as parent: Alfred I. duPont Hospital for Children
Not found

该守则是:

import requests

SESSION = requests.Session()
USGOV_ID = "https://ror.org/02rcrvv70"


def get_id(url):
    """Given a node ID or a URL, return the ID part."""
    return url.split("/")[-1]


def get_info(node_id):
    """Given a node ID, (e.g. 01sbq1a82), return the details."""
    node_id = get_id(node_id)
    url = f"https://api.ror.org/organizations/{node_id}"
    response = SESSION.get(url)
    return response.json()


def has_usgov_as_parent(node_id):
    """
    Returns True if this node, or any parent up the chain is a US Gov,
    return False otherwise.
    """
    node = get_info(node_id)
    print(f"Testing node {node[ name ]}, id={get_id(node[ id ])}")

    if node["id"] == USGOV_ID:
        # Case: This node is US gov
        print("Found it")
        return True
    
    for sub_node in node["relationships"]:
        if sub_node["type"] != "Parent":
            # This is not a parent, skip
            continue

        if sub_node["id"] == USGOV_ID:
            # This parent node is a US gov
            print(f"Direct parent of {node[ name ]} is US Gov")
            return True
        
        if has_usgov_as_parent(sub_node["id"]):
            # This parent node is not a US gov, but somewhere
            # up the chain, we found a US gov node
            print(f"Up-the-chain parent of {node[ name ]} is US Gov")
            return True

    print(f"Does not have US Gov as parent: {node[ name ]}")
    return False


node_id = "00jyx0v10"
if has_usgov_as_parent(node_id):
    print("Found it now")
else:
    print("Not found")

友情链接