一种简单的解决办法是,将多个次程序要求将档案出口到特别安全局格式:
import subprocess
# Global variables for directory being mapped
location = . # Enter the path here.
pattern = *.py # Use this if you want to only return certain filetypes
rootDir = location.rpartition( / )[-1]
outputFile = rootDir + _directory_contents.csv
# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = find + location + -name + pattern + -fprintf + outputFile + "%Y%M,%n,%u,%g,%s,%A+,%P
"
subprocess.call(find_cmd, shell=True)
该指挥部产生了可轻易在Excel分析的混合价值。
f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py
由此而来的CSV档案没有头盔,但可以使用第二部指挥来补充。
# Add headers to the CSV
headers_cmd = sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" + outputFile
subprocess.call(headers_cmd, shell=True)
根据你获得多少数据,你可以进一步利用。 在这方面,我发现有些事情是有用的,特别是如果你重新处理许多层次的名录,来研究。
增加进口:
import numpy as np
import pandas as pd
接着,在您的法典中加入:
# Create DataFrame from the CSV file created above.
df = pd.read_csv(outputFile)
# Format columns
# Get the filename and file extension from the filepath
df[ FileName ] = df[ FilePath ].str.rsplit("/", 1).str[-1]
df[ FileExt ] = df[ FileName ].str.rsplit( . , 1).str[1]
# Get the full path to the files. If the path doesn t include a "/" it s the root directory
df[ FullPath ] = df["FilePath"].str.rsplit("/", 1).str[0]
df[ FullPath ] = np.where(df[ FullPath ].str.contains("/"), df[ FullPath ], rootDir)
# Split the path into columns for the parent directory and its children
df[ ParentDir ] = df[ FullPath ].str.split("/", 1).str[0]
df[ SubDirs ] = df[ FullPath ].str.split("/", 1).str[1]
# Account for NaN returns, indicates the path is the root directory
df[ SubDirs ] = np.where(df.SubDirs.str.contains( NaN ), , df.SubDirs)
# Determine if the item is a directory or file.
df[ Type ] = np.where(df[ Permissions ].str.startswith( d ), Dir , File )
# Split the time stamp into date and time columns
df[[ ModifiedDate , Time ]] = df.ModifiedTime.str.rsplit( + , 1, expand=True)
df[ Time ] = df[ Time ].str.split( . ).str[0]
# Show only files, output includes paths so you don t necessarily need to display the individual directories.
df = df[df[ Type ].str.contains( File )]
# Set columns to show and their order.
df = df[[ FileName , ParentDir , SubDirs , FullPath , DocType , ModifiedDate , Time , Size ]]
filesize = [] # Create an empty list to store file sizes to convert them to something more readable.
# Go through the items and convert the filesize from bytes to something more readable.
for items in df[ Size ].items():
filesize.append(convert_bytes(items[1]))
df[ Size ] = filesize
# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
for directory, data in df.groupby( ParentDir ):
data.to_excel(writer, sheet_name = directory, index=False)
# To convert sizes to be more human readable
def convert_bytes(size):
for x in [ b , K , M , G , T ]:
if size < 1024:
return "%3.1f %s" % (size, x)
size /= 1024
return size