English 中文(简体)
灰色清单目录、次子和档案
原标题:Python list directory, subdirectory, and files

我试图用文字把所有目录、分局和档案列入某个名录。

我试图这样做:

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

不幸的是,它没有适当开展工作。 我拿到所有档案,但不是完整的途径。

例如,如果该名录的构件是:

/home/patate/directory/targetdirectory/123/456/789/file.txt

它将印刷:

/home/patate/directory/targetdirectory/file.txt

我需要第一个结果。

最佳回答

使用<条码>os.path.join,以压缩directory> 姓名<>:>

import os

for path, subdirs, files in os.walk(root):
    for name in files:
        print(os.path.join(path, name))

注 在分类中使用<代码>path,而不是root,因为使用<代码>root是不正确的。


http://docs.python.org/dev/library/pathlib.html 添加了模块,以方便道路操作。 因此,相当于<代码>os.path.join。 将:

pathlib.PurePath(path, name)

<代码>pathlib的优点是,你可以在道路上使用各种有用的方法。 如果你使用具体的<代码>Path变量,你也可以通过这些变量进行实际的本组织电话,如改用目录,删除道路,向文件开放。

问题回答

案件...... • 将所有档案输入与某些模式相匹配的目录和分局(例如:*.py):

import os
from fnmatch import fnmatch

root =  /some/directory 
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print(os.path.join(path, name))

Here is a one-liner:

import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk( ./ )] for val in sublist]
# Meta comment to ease selecting text

......的列表中最外表的<代码>阀使清单成为一个方面。 。 最后,<代码>i在所有名录和分名录上发放。

这一实例在.walk(......)。 你们可以补充你这样的道路。

<>注:os.path.expanduser<<>>/a>和/或> 代码>。 可在~/等路上使用。

Extending this example:

很容易在档案中添加基名测试和目录测试。

例:的测试*。

... for j in i[2] if j.endswith( .jpg )] ...

此外,不包括<代码>.git名录:

... for i in os.walk( ./ ) if  .git  not in i[0].split( / )]

另一种选择是使用glob。 标准图书馆模块:

import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):
    print(path)

如果需要一名主持人,请您使用。 作为一种替代办法:

for file in glob.iglob(my_path, recursive=True):
    # ...

A bit simpler one-liner:

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

你可以看一下我提出的这一样本。 它使用“os.path.walk>功能,这些功能已折旧。 它使用一个清单储存所有档案馆。

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"

def fileWalker(ext, dirname, names):
       
    checks files in names   
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f, pat):
            ext[1].append(os.path.join(dirname, f))


def writeTo(fList):

    with open(where_to, "w") as f:
        for di_r in fList:
            f.write(di_r + "
")


if __name__ ==  __main__ :
    li = []
    os.path.walk(root, fileWalker, [ex, li])

    writeTo(li)

由于这里的每一个例子都只是使用<代码>walk(join),因此,我要举一个冰箱,并与<编码>listdir进行比较:

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses  \  instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\","/")+"/"+file] # folder+"\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses  \  instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

因此,正如你可以看到的那样,<代码>listdir版本更有效率。 (而join进展缓慢)

这只是一个补充。 因此,你可以将数据输入特别志愿人员格式:

import sys, os

try:
    import pandas as pd
except:
    os.system("pip3 install pandas")

root = "/home/kiran/Downloads/MainFolder" # It may have many subfolders and files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv"      # I want to get only csv files
pattern = "*.*"        # Note: Use this pattern to get all types of files and folders
for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")

一种简单的解决办法是,将多个次程序要求将档案出口到特别安全局格式:

import subprocess

# Global variables for directory being mapped

location =  .  # Enter the path here.
pattern =  *.py  # Use this if you want to only return certain filetypes
rootDir = location.rpartition( / )[-1]
outputFile = rootDir +  _directory_contents.csv 

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd =  find   + location +   -name   + pattern +    -fprintf   + outputFile +    "%Y%M,%n,%u,%g,%s,%A+,%P
" 
subprocess.call(find_cmd, shell=True)

该指挥部产生了可轻易在Excel分析的混合价值。

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

由此而来的CSV档案没有头盔,但可以使用第二部指挥来补充。

# Add headers to the CSV
headers_cmd =  sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath"   + outputFile
subprocess.call(headers_cmd, shell=True)

根据你获得多少数据,你可以进一步利用。 在这方面,我发现有些事情是有用的,特别是如果你重新处理许多层次的名录,来研究。

增加进口:

import numpy as np
import pandas as pd

接着,在您的法典中加入:

# Create DataFrame from the CSV file created above.
df = pd.read_csv(outputFile)

# Format columns
# Get the filename and file extension from the filepath
df[ FileName ] = df[ FilePath ].str.rsplit("/", 1).str[-1]
df[ FileExt ] = df[ FileName ].str.rsplit( . , 1).str[1]

# Get the full path to the files. If the path doesn t include a "/" it s the root directory
df[ FullPath ] = df["FilePath"].str.rsplit("/", 1).str[0]
df[ FullPath ] = np.where(df[ FullPath ].str.contains("/"), df[ FullPath ], rootDir)

# Split the path into columns for the parent directory and its children
df[ ParentDir ] = df[ FullPath ].str.split("/", 1).str[0]
df[ SubDirs ] = df[ FullPath ].str.split("/", 1).str[1]
# Account for NaN returns, indicates the path is the root directory
df[ SubDirs ] = np.where(df.SubDirs.str.contains( NaN ),   , df.SubDirs)

# Determine if the item is a directory or file.
df[ Type ] = np.where(df[ Permissions ].str.startswith( d ),  Dir ,  File )

# Split the time stamp into date and time columns
df[[ ModifiedDate ,  Time ]] = df.ModifiedTime.str.rsplit( + , 1, expand=True)
df[ Time ] = df[ Time ].str.split( . ).str[0]

# Show only files, output includes paths so you don t necessarily need to display the individual directories.
df = df[df[ Type ].str.contains( File )]

# Set columns to show and their order.
df = df[[ FileName ,  ParentDir ,  SubDirs ,  FullPath ,  DocType ,  ModifiedDate ,  Time ,  Size ]]

filesize = [] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df[ Size ].items():
    filesize.append(convert_bytes(items[1]))
    df[ Size ] = filesize

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby( ParentDir ):
    data.to_excel(writer, sheet_name = directory, index=False)


# To convert sizes to be more human readable
def convert_bytes(size):
    for x in [ b ,  K ,  M ,  G ,  T ]:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...