Question

我试图用文字把所有目录、分局和档案列入某个名录。

我试图这样做:

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

不幸的是,它没有适当开展工作。我拿到所有档案,但不是完整的途径。

例如,如果该名录的构件是:

/home/patate/directory/targetdirectory/123/456/789/file.txt

它将印刷:

/home/patate/directory/targetdirectory/file.txt

我需要第一个结果。

Answer 1

使用<条码>os.path.join,以压缩directory和> 姓名<>:>

import os for path, subdirs, files in os.walk(root): for name in files: print(os.path.join(path, name))

注在分类中使用<代码>path,而不是root,因为使用<代码>root是不正确的。

http://docs.python.org/dev/library/pathlib.html 添加了模块,以方便道路操作。因此,相当于<代码>os.path.join。将:

pathlib.PurePath(path, name)

<代码>pathlib的优点是,你可以在道路上使用各种有用的方法。如果你使用具体的<代码>Path变量,你也可以通过这些变量进行实际的本组织电话,如改用目录,删除道路,向文件开放。

Answer 2

案件...... • 将所有档案输入与某些模式相匹配的目录和分局(例如:*.py):

import os
from fnmatch import fnmatch

root =  /some/directory 
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print(os.path.join(path, name))

Answer 3

Here is a one-liner:

import os [val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk( ./ )] for val in sublist] # Meta comment to ease selecting text

......的列表中最外表的<代码>阀使清单成为一个方面。。最后,<代码>i在所有名录和分名录上发放。

这一实例在.walk(......)。你们可以补充你这样的道路。

<>注:os.path.expanduser<<>>/a>和/或 > 代码>。可在~/等路上使用。

Extending this example:

很容易在档案中添加基名测试和目录测试。

例:的测试*。



... for j in i[2] if j.endswith( .jpg )] ...

此外,不包括<代码>.git名录:

... for i in os.walk( ./ ) if  .git  not in i[0].split( / )]

Answer 4

另一种选择是使用 glob。标准图书馆模块:

import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):
    print(path)

如果需要一名主持人,请您使用。作为一种替代办法:

for file in glob.iglob(my_path, recursive=True): # ...

Answer 5

A bit simpler one-liner:

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

Answer 6

你可以看一下我提出的这一样本。它使用“os.path.walk>功能,这些功能已折旧。它使用一个清单储存所有档案馆。

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"

def fileWalker(ext, dirname, names):
       
    checks files in names   
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f, pat):
            ext[1].append(os.path.join(dirname, f))


def writeTo(fList):

    with open(where_to, "w") as f:
        for di_r in fList:
            f.write(di_r + "
")


if __name__ ==  __main__ :
    li = []
    os.path.walk(root, fileWalker, [ex, li])

    writeTo(li)

Answer 7

由于这里的每一个例子都只是使用<代码>walk(join),因此,我要举一个冰箱,并与<编码>listdir进行比较:

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses  \  instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\","/")+"/"+file] # folder+"\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses  \  instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

因此,正如你可以看到的那样,<代码>listdir版本更有效率。 (而join进展缓慢)

Answer 8

借助任何支持的“灰色”版本(3.4+),你应使用 pathlib.rglob。重新列出现有名录和所有分局的内容:

from pathlib import Path


def generate_all_files(root: Path, only_files: bool = True):
    for p in root.rglob("*"):
        if only_files and not p.is_file():
            continue
        yield p


for p in generate_all_files(Path("."), only_files=False):
    print(p)

如果你想得到一些复制件:

Example

颜色结构:

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│   └── bar.bz.gz2
├── .hidden
│   └── secrect-file
└── martin
    └── thoma
        └── cv.pdf

说明:

$ python collect.py 
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

Answer 9

如果你想要将档案列入 SharePoint ,那么你将如何列出。你的路程很可能是在“快乐”部分之后开始的。

import os

root = r"\mycompany.sharepoint.com@SSLDavWWWRoot	eamsMyFolderPolicies and ProceduresDeal DocsMy Deals"
list = [os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
print(list)

Answer 10

这只是一个补充。因此,你可以将数据输入特别志愿人员格式:

import sys, os

try:
    import pandas as pd
except:
    os.system("pip3 install pandas")

root = "/home/kiran/Downloads/MainFolder" # It may have many subfolders and files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv"      # I want to get only csv files
pattern = "*.*"        # Note: Use this pattern to get all types of files and folders
for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")

Answer 11

一种简单的解决办法是,将多个次程序要求将档案出口到特别安全局格式:

import subprocess

# Global variables for directory being mapped

location =  .  # Enter the path here.
pattern =  *.py  # Use this if you want to only return certain filetypes
rootDir = location.rpartition( / )[-1]
outputFile = rootDir +  _directory_contents.csv 

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd =  find   + location +   -name   + pattern +    -fprintf   + outputFile +    "%Y%M,%n,%u,%g,%s,%A+,%P
" 
subprocess.call(find_cmd, shell=True)

该指挥部产生了可轻易在Excel分析的混合价值。

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

由此而来的CSV档案没有头盔,但可以使用第二部指挥来补充。

# Add headers to the CSV
headers_cmd =  sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath"   + outputFile
subprocess.call(headers_cmd, shell=True)

根据你获得多少数据,你可以进一步利用。在这方面,我发现有些事情是有用的,特别是如果你重新处理许多层次的名录,来研究。

增加进口:

import numpy as np
import pandas as pd

接着,在您的法典中加入:

# Create DataFrame from the CSV file created above.
df = pd.read_csv(outputFile)

# Format columns
# Get the filename and file extension from the filepath
df[ FileName ] = df[ FilePath ].str.rsplit("/", 1).str[-1]
df[ FileExt ] = df[ FileName ].str.rsplit( . , 1).str[1]

# Get the full path to the files. If the path doesn t include a "/" it s the root directory
df[ FullPath ] = df["FilePath"].str.rsplit("/", 1).str[0]
df[ FullPath ] = np.where(df[ FullPath ].str.contains("/"), df[ FullPath ], rootDir)

# Split the path into columns for the parent directory and its children
df[ ParentDir ] = df[ FullPath ].str.split("/", 1).str[0]
df[ SubDirs ] = df[ FullPath ].str.split("/", 1).str[1]
# Account for NaN returns, indicates the path is the root directory
df[ SubDirs ] = np.where(df.SubDirs.str.contains( NaN ),   , df.SubDirs)

# Determine if the item is a directory or file.
df[ Type ] = np.where(df[ Permissions ].str.startswith( d ),  Dir ,  File )

# Split the time stamp into date and time columns
df[[ ModifiedDate ,  Time ]] = df.ModifiedTime.str.rsplit( + , 1, expand=True)
df[ Time ] = df[ Time ].str.split( . ).str[0]

# Show only files, output includes paths so you don t necessarily need to display the individual directories.
df = df[df[ Type ].str.contains( File )]

# Set columns to show and their order.
df = df[[ FileName ,  ParentDir ,  SubDirs ,  FullPath ,  DocType ,  ModifiedDate ,  Time ,  Size ]]

filesize = [] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df[ Size ].items():
    filesize.append(convert_bytes(items[1]))
    df[ Size ] = filesize

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby( ParentDir ):
    data.to_excel(writer, sheet_name = directory, index=False)


# To convert sizes to be more human readable
def convert_bytes(size):
    for x in [ b ,  K ,  M ,  G ,  T ]:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size

Here is a one-liner:

Extending this example:

Example

友情链接