English 中文(简体)
当你在沙尔建造一个URL时,如何加入一条道路。
原标题:How to join components of a path when you are constructing a URL in Python
  • 时间:2009-11-24 22:06:07
  •  标签:
  • python
  • url

例如,我要加入一条通往资源道路的预设道路,如js/foo.js。

我希望由此产生的道路与服务器的根基相对。 在上述例子中,如果预先确定是“媒体”,我就希望结果成为媒体/js/foo.js。

os.path.join确实做得很好,但是,它如何走进道路取决于本组织。 在该案中,我知道我以网络而不是地方档案系统为目标。

如果你正在与你知道的道路合作,是否会在URLs中使用最好的替代办法? os.path.join的工作是否足够? 我是否应该亲自发言?

最佳回答

由于从被占领土上公布的评论意见来看,他似乎希望保留加入的“绝对的URLs”(这是<代码>urlparse.urljoin的关键工作之一), 我建议避免这种情况。 由于同样的原因,os.path.join也将是坏的。

因此,如果要忽略主要事项,当然也是可行的,那么,我就使用<代码>/.join(s.strip( / )等内容。

问题回答

您可使用urllib.parse.urljoin:

>>> from urllib.parse import urljoin
>>> urljoin( /media/path/ ,  js/foo.js )
 /media/path/js/foo.js 

www.un.org/Depts/DGACM/index_spanish.htm 但是:

>>> urljoin( /media/path ,  js/foo.js )
 /media/js/foo.js 
>>> urljoin( /media/path ,  /js/foo.js )
 /js/foo.js 

您从<代码>/js/foo.js和js/foo.js中得出不同结果的原因是,前者从斜线开始,表示它已经从网站根基开始。

在Adhuro,你必须做些什么。

from urlparse import urljoin

和您一样,os.path.join加入基于目前os的通道。 <代码>posixpath是代号为os.path的代号系统所使用的基本模块:

>>> os.path.join is posixpath.join
True
>>> posixpath.join( /media/ ,  js/foo.js )
 /media/js/foo.js 

因此,你只能进口和使用<条形码>。 而是用于提供并可在任何平台<>上工作。

<><>Edit>: @Pete的建议是好的,你可以把进口包括在内,以便提高可读性。

from posixpath import join as urljoin

<><>Edit>: 我认为,如果你研究<条码>s.py的来源,这一点就会更加明确,或者至少有助于我理解。 (此处的代码为2.7.11,加上I ve trimmed some bits)。 在<代码>os.py上有条件的进口,这些进口可选择哪一种路径模块在名称上使用os.path。 可在os.py上进口的所有基本模块(path,os2emxpath,riscospath,除其他外,作为pathos.py只是选取一个单元,以“空间”(os.path)为基础,在运行时间根据目前的专业单位使用。

# os.py
import sys, errno

_names = sys.builtin_module_names

if  posix  in _names:
    # ...
    from posix import *
    # ...
    import posixpath as path
    # ...

elif  nt  in _names:
    # ...
    from nt import *
    # ...
    import ntpath as path
    # ...

elif  os2  in _names:
    # ...
    from os2 import *
    # ...
    if sys.version.find( EMX GCC ) == -1:
        import ntpath as path
    else:
        import os2emxpath as path
        from _emx_link import link
    # ...

elif  ce  in _names:
    # ...
    from ce import *
    # ...
    # We can use the standard Windows path.
    import ntpath as path

elif  riscos  in _names:
    # ...
    from riscos import *
    # ...
    import riscospath as path
    # ...

else:
    raise ImportError,  no os specific module found 

工作出色:

def urljoin(*args):
    """
    Joins given arguments into an url. Trailing but not leading slashes are
    stripped for each argument.
    """

    return "/".join(map(lambda x: str(x).rstrip( / ), args))

我发现,与上述所有解决办法不一样,因此,我与我一起来。 这一版本确保了部分内容与单一斜线合并,只剩下铅和拖拉。 No pipstal/code>, no urllib.parse.urljoin weirdness.

In [1]: from functools import reduce

In [2]: def join_slash(a, b):
   ...:     return a.rstrip( / ) +  /  + b.lstrip( / )
   ...:

In [3]: def urljoin(*args):
   ...:     return reduce(join_slash, args) if args else   
   ...:

In [4]: parts = [ https://foo-bar.quux.net ,  /foo ,  bar ,  /bat/ ,  /quux/ ]

In [5]: urljoin(*parts)
Out[5]:  https://foo-bar.quux.net/foo/bar/bat/quux/ 

In [6]: urljoin( https://quux.com/ ,  /path ,  to/file/// ,  //here/ )
Out[6]:  https://quux.com/path/to/file/here/ 

In [7]: urljoin()
Out[7]:   

In [8]: urljoin( // , beware ,  of/this/// )
Out[8]:  /beware/of/this/// 

In [9]: urljoin( /leading ,  and/ ,  /trailing/ ,  slash/ )
Out[9]:  /leading/and/trailing/slash/ 

<basejoinFunction in the urllib Pack may be what You re-see for.

basejoin = urljoin(base, url, allow_fragments=True)
    Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter.

Edit:我以前曾发出过通知,但还是文盲。 该基地似乎直接绘制了图,以图ur鲁拉林,使后者更愿意。

使用呋喃,pip安装了furl

 furl.furl( /media/path/ ).add(path= js/foo.js )

我知道,这比禁止化学武器组织要求的要多一点,但我有几处 pieces,并且正在寻找一种简单的方式加入:

>>> url =  https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250 

探讨:

>>> split = urlparse.urlsplit(url)
>>> split
SplitResult(scheme= https , netloc= api.foo.com , path= /orders/bartag , query= spamStatus=awaiting_spam&page=1&pageSize=250 , fragment=  )
>>> type(split)
<class  urlparse.SplitResult >
>>> dir(split)
[ __add__ ,  __class__ ,  __contains__ ,  __delattr__ ,  __dict__ ,  __doc__ ,  __eq__ ,  __format__ ,  __ge__ ,  __getattribute__ ,  __getitem__ ,  __getnewargs__ ,  __getslice__ ,  __getstate__ ,  __gt__ ,  __hash__ ,  __init__ ,  __iter__ ,  __le__ ,  __len__ ,  __lt__ ,  __module__ ,  __mul__ ,  __ne__ ,  __new__ ,  __reduce__ ,  __reduce_ex__ ,  __repr__ ,  __rmul__ ,  __setattr__ ,  __sizeof__ ,  __slots__ ,  __str__ ,  __subclasshook__ ,  __weakref__ ,  _asdict ,  _fields ,  _make ,  _replace ,  count ,  fragment ,  geturl ,  hostname ,  index ,  netloc ,  password ,  path ,  port ,  query ,  scheme ,  username ]
>>> split[0]
 https 
>>> split = (split[:])
>>> type(split)
<type  tuple >

因此,除了其他答复中已经解答的走路外, 为了了解我所期待的情况:

>>> split
( https ,  api.foo.com ,  /orders/bartag ,  spamStatus=awaiting_spam&page=1&pageSize=250 ,   )
>>> unsplit = urlparse.urlunsplit(split)
>>> unsplit
 https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250 

根据,文件。 只读5门。

采用以下图表:

URL计划计有虚构的空插图

净额 1个网络地点部分空洞

第2条

问题3

碎片 4 零散标识

鲁恩·卡高德为我提供了一份出色和细致的解决办法,我扩大了这一解决办法:

def urljoin(*args):
    trailing_slash =  /  if args[-1].endswith( / ) else   
    return "/".join(map(lambda x: str(x).strip( / ), args)) + trailing_slash

这使所有论点都能得到采纳,而不管有线索和结束冲突,同时如果存在的话,保留最后的鞭.。

为了略微改进Alex Martelli的反应,以下措施不仅将清理额外的冲突,而且还将保持(即将)的闪lash,有时可能有用:

>>> items = ["http://www.website.com", "/api", "v2/"]
>>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)])
>>> print(url)
http://www.website.com/api/v2/

它不容易阅读,并赢得多处外线的清理。

How about this: It is Somewhat Efficient & Somewhat Simple. Only need to join 2 parts of url path:

def UrlJoin(a , b):
    a, b = a.strip(), b.strip()
    a = a if a.endswith( / ) else a +  / 
    b = b if not b.startswith( / ) else b[1:]
    return a + b

OR: 更具常规性,但如果仅加入一条道路的2个半部分,则效率不高。

def UrlJoin(*parts):
    return  / .join([p.strip().strip( / ) for p in parts])

测试案例:

>>> UrlJoin( https://example.com/ ,  /TestURL_1 )
 https://example.com/TestURL_1 

>>> UrlJoin( https://example.com ,  TestURL_2 )
 https://example.com/TestURL_2 

注:我可以在此分头发,但至少是良好做法,而且可能更容易读。

Using furl regex (第3页)

>>> import re
>>> import furl
>>> p = re.compile(r (/)+ )
>>> url = furl.furl( /media/path ).add(path= /js/foo.js ).url
>>> url
 /media/path/js/foo.js 
>>> p.sub(r"1", url)
 /media/path/js/foo.js 
>>> url = furl.furl( /media/path ).add(path= js/foo.js ).url
>>> url
 /media/path/js/foo.js 
>>> p.sub(r"1", url)
 /media/path/js/foo.js 
>>> url = furl.furl( /media/path/ ).add(path= js/foo.js ).url
>>> url
 /media/path/js/foo.js 
>>> p.sub(r"1", url)
 /media/path/js/foo.js 
>>> url = furl.furl( /media///path/// ).add(path= //js///foo.js ).url
>>> url
 /media///path/////js///foo.js 
>>> p.sub(r"1", url)
 /media/path/js/foo.js 

一行:

from functools import reduce
reduce(lambda x,y:  {}/{} .format(x,y), parts) 

[https://api.somecompany.com/v1 , climate , Rain]

这里有安全版本,Im使用。 它照顾着预设装置和拖.。 终端“URI”的拖拉线单独处理。

def safe_urljoin(*uris) -> str:
    """
    Joins the URIs carefully considering the prefixes and trailing slashes.
    The trailing slash for the end URI is handled separately.
    """
    if len(uris) == 1:
        return uris[0]

    safe_urls = [
        f"{url.lstrip( / )}/" if not url.endswith("/") else url.lstrip("/")
        for url in uris[:-1]
    ]
    safe_urls.append(uris[-1].lstrip("/"))
    return "".join(safe_urls)

产出

>>> safe_urljoin("https://a.com/", "adunits/", "/both/", "/left")
>>>  https://a.com/adunits/both/left 

>>> safe_urljoin("https://a.com/", "adunits/", "/both/", "right/")
>>>  https://a.com/adunits/both/right/ 

>>> safe_urljoin("https://a.com/", "adunits/", "/both/", "right/", "none")
>>>  https://a.com/adunits/both/right/none 

>>> safe_urljoin("https://a.com/", "adunits/", "/both/", "right/", "none/")
>>>  https://a.com/adunits/both/right/none/ 

具有独特特征的另一变:

def urljoin(base:str, *parts:str) -> str:
    for part in filter(None, parts):
        base =  {}/{} .format(base.rstrip( / ), part.lstrip( / ))
    return base
  • Preserve trailing slash in base or last part
  • Empty parts are ignored
  • For each non-empty part, remove trailing from base and leading from part and join with a single /
urljoin( http://a.com/api ,    )  ->  http://a.com/api 
urljoin( http://a.com/api ,   / ) ->  http://a.com/api/ 
urljoin( http://a.com/api/ ,   )  ->  http://a.com/api/ 
urljoin( http://a.com/api/ ,  / ) ->  http://a.com/api/ 
urljoin( http://a.com/api/ ,  /a/ ,  /b ,  c ,  d/ ) ->  http://a.com/api/a/b/c/d/ 

奥基,这是我所做的,因为我需要完全独立于预先确定的根源:

def url_join(base: str, *components: str, slash_left=True, slash_right=True) -> str:
    """Join two or more url components, inserting  /  as needed.
    Optionally, a slash can be added to the left or right side of the URL.
    """
    base = base.lstrip( / ).rstrip( / )
    components = [component.lstrip( / ).rstrip( / ) for component in components]
    url = f"/{base}" if slash_left else base
    for component in components:
        url = f"{url}/{component}" 
    return f"{url}/" if slash_right else url

url_join("http://whoops.io", "foo/", "/bar", "foo", slash_left=False)
# "http://whoops.io/foo/bar/foo/"
url_join("foo", "bar")
# "/foo/bar/""




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...