我写了一个职能,即:浏览,以在树种中搜索和选择档案。
In the following exemple, files that are searched for are files with extensions 。dat
, 。rtf
, 。jpeg
in directories whose names match the following regex pattern:
r J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)
注意到存在有条件的基本模式:
(?(1)TURI1d*|MONOd+)
with group references (1)
and 1
to the number-matching group (d+) in elementary pattern b[ae]r(d+)
。
1 )
这里有一部法律,用以创建作为实例的名录树:
(护理,首先删除 director、胎儿、fr、 f,然后再创建)
import os
from shutil import rmtree
top = J:\
for x in ( foo\ , fooo\ , froooo\ , faooo\ ):
if os。path。isdir(top + x):
rmtree(top + x)
li = [( foo\ ,( basil\ , poto%\ , tamata\ )),
( foo\basil\ ,( ber89 , ber300 )),
( foo\basil\ber89\ ,( TURI850 , TURI1023 )),
( foo\poto%\ ,( ocean , earth )),
( foo\tamata\ ,( vahine ,)),
( fooo\ ,( york#\ , plain\ , atlantis\ )),
( fooo\york#\ ,( noto , nata )),
( fooo\plain\ ,( zx13ao , ws89rt , bar999 )),
( fooo\plain\bar999\ ,( TURI99905 , TURI2227 , MONO2 )),
( fooo\plain\bar999\TURI99905\ ,( AERIAL , minidisc )),
( fooo\plain\bar999\TURI99905\AERIAL\ ,( bumbum , corean )),
( fooo\atlantis\ ,( atlABC , atlDEFG )),
( fooo\atlantis\atlABC\ ,( atlantis_sound , atlantis_image )),
( froooo\ ,( one_dir\ , another_dir\ )),
( froooo\one_dir\ ,( bar25 , ber )),
( froooo\one_dir\bar25\ ,( TURI2501 , TURI2502 , TURI4813 , MONO8 )),
( froooo\one_dir\ber\ ,( TURI30 , TURI , MONO532 )),
( froooo\another_dir\ ,( notseen , notseen2 )),
( faooo\ ,( somolo-\ , samala+\ ))]
for rep,several in li:
#print top + rep
if os。path。isdir(top + rep) == False:
os。mkdir(top + rep)
for name in several:
#print top + rep + name
os。mkdir(top + rep + name)
for filepath in (top + foo\kalaomi。xls ,
top + foo\basil\ber89\TURI850\quetzal。jpeg ,
top + foo\basil\ber89\TURI850\tehoi。txt ,
top + foo\poto%\curcuma in poto%。txt ,
top + foo\poto%\ocean\file in ocean。rtf ,
top + foo\tamata\vahine\tahiti。jpeg ,
top + fooo\york#\yorkshire。jpeg ,
top + fooo\plain\bar999\TURI99905\galileo。jpeg ,
top + fooo\plain\bar999\TURI99905\polynesia。dat ,
top + fooo\plain\bar999\TURI99905\concrete。txt ,
top + fooo\plain\bar999\TURI2227\Monroe。jpeg ,
top + fooo\plain\bar999\MONO2\elastic。jpeg ,
top + froooo\one_dir\photo in one_dir。jpeg ,
top + froooo\one_dir\tabula。xls ,
top + froooo\one_dir\bar25\TURI2501\matallelo。jpeg ,
top + froooo\one_dir\bar25\TURI2501\italy。dat ,
top + froooo\one_dir\bar25\TURI2501\beretta。xls ,
top + froooo\one_dir\bar25\TURI2501\turi2501_ser。rtf ,
top + froooo\one_dir\bar25\TURI4813\boaf_inTURI4813。jpeg ,
top + froooo\one_dir\bar25\TURI4813\troui_in_TURI4813。txt ,
top + froooo\one_dir\bar25\MONO8\in_mono8。dat ,
top + froooo\one_dir\bar25\MONO8\in_mono8。rtf ,
top + froooo\one_dir\bar25\MONO8\in_mono8。xls ,
top + froooo\one_dir\bar25\TURI2502\adamante。jpeg ,
top + froooo\one_dir\bar25\TURI2502\egyptic。txt ,
top + froooo\one_dir\bar25\TURI2502\urubu。rtf ,
top + froooo\one_dir\ber\MONO532\bacillus。jpeg ,
top + froooo\one_dir\ber\MONO532\blueberry。dat ,
top + froooo\one_dir\ber\MONO532\Perfume。doc ,
top + faooo\samala+\kfaz。dat ,
top + faooo\somolo-\ytek。rtf ,
top + faooo\123。txt ,
top + faooo\458。rtf ,):
with open(filepath, w ) as f:
pass
这部法律创造了以下树:
J:
|
|--foo
| |--basil
| |--ber89
| |--TURI850
| |--file quetzal。jpeg
| |--file tehoi。txt
| |--TURI1023
| |--ber300
| |--poto%
| |--ocean
| |--file in ocean。rtf
| |--earth
| |--file curcuma in poto%。txt
| |--tamata
| |--vahine
| |--file tahiti。jpeg
| |--file kalaomi。xls
|
|--fooo
| |--york#
| |--noto
| |--nata
| |---file yorkshire。jpeg
| |--plain
| |--zx13ao
| |--ws89rt
| |--bar999
| |--TURI99905
| |--AERIAL
| |--bumbum
| |--corean
| |--minidisc
| |--file galileo。jpeg
| |--file polynesia。dat
| |--file concrete。txt
| |--TURI2227
| |--file Monroe。jpeg
| |--MONO2
| |--file elastic。jpeg
| |--atlantis
| |--atlABC
| |--atlantis_sound
| |--atlantis_image
| |--atlDEFG
|
|--froooo
| |--one_dir
| |--bar25
| |--TURI2501
| |--file matalello。jpeg
| |--file italy。dat
| |--file beretta。xls
| |--file turi2501_ser。rtf
| |--TURI2502
| |--file adamante。jpeg
| |--file egyptic。txt
| |--file urubu。rtf
| |--TURI4813
| |--file boaf_inTURI4813。jpeg
| |--file troui_inTURI4813。txt
| |--MONO8
| |--file in_mono8。dat
| |--file in_mono8。rtf
| |--file in_mono8。xls
| |--ber
| |--TURI30
| |--TURI
| |--MONO532
| |--file bacillus。jpeg
| |--file blueberry。dat
| |--file Perfume。doc
| |--file photo in one_dir。jpeg
| |--file tabula。xls
| |--another_dir
| |--notseen
| |--notseen2
|
|--faooo
| |--somolo-
| |--file ytek。rtf
| |--samala+
| |file kfaz。dat
| |--file 123。txt
| |--file 458。rtf
与档案相匹配的reg形是:
r J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)\w+。(dat|rtf|jpeg)
为寻找这类档案而有选择地探讨的目录如下:
J:\fooo\plain\bar999\TURI99905
J:\froooo\one_dir\bar25\TURI2501
J:\froooo\one_dir\bar25\TURI2502
J:\froooo\one_dir\ber\MONO532
。
2 )
作为初步的证明,这里有一部显示select_walk()部分功能的法典,该功能守则建立了必要的规章,仅可在圆 the的树路中抽取选定的名录,并返还选定的档案:
import re
def compute_regexes(pat_file, displ = True):
from os import sep
splitted_pat = re。split(r \\ if sep== \ else / , pat_file)
pat_parent_dir = (r \ if sep== \ else / )。join(splitted_pat[0:-1])
if displ:
print ( IN FUNCTION compute_regexes() :
pat_file== %s
splitted_pat :
%s
pat_parent_dir== %s
)
% (pat_file ,
。join(splitted_pat) , pat_parent_dir)
dgr = {}
for i,el in enumerate(splitted_pat):
if re。search( (。*?) ,el):
dgr[len(dgr)+1] = i
if displ:
print dgr :
print
。join( group(%s) is in splitted_pat[%s] % (g,i)
for g,i in dgr。iteritems())
def repl(mat, dgr = dgr):
the = int(mat。group(1) if mat。group(1) else mat。group(2))
return str(the + dgr[the])
for i,el in enumerate(splitted_pat):
splitted_pat[i] = re。sub(r (?<=(?()(d+)(?=))|(?<=\)(d+) ,repl,el)
pat_dirs =
for x in splitted_pat[-2:0:-1]:
pat_dirs = r (?=\|)(\%s%s)? % (x,pat_dirs)
pat_dirs = splitted_pat[0] + pat_dirs
if displ:
print
pat_dirs== ,pat_dirs
return (re。compile(pat_file), re。compile(pat_dirs), re。compile(pat_parent_dir) )
pat_file = r J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)\w+。(dat|rtf|jpeg)
regx_file, regx_dirs, regx_parent_dir = compute_regexes(pat_file)
print
EXAMPLES with regx_file :
print pat_file== ,pat_file
for filepath in ( J:\fooo\basil\ber92TURI9258\beru。rtf ,
J:\froooooo\ki_ki\barMONO47\madrid。jpeg ):
print filepath,bool(regx_file。match(filepath))
print
EXAMPLES with regx_dirs :
for path in ( J:\fooo ,
J:\fooo\basil ,
J:\fooo\basil\ber92 ,
J:\fooo\basil\ber92\TURI777 ,
J:\fooo\basil\ber92\TURI9258 ,
J:\froooooo
J:\froooooo\ki_ki ,
J:\froooooo\ki_ki\bar ,
J:\froooooo\ki=ki\bar ,
J:\froooooo\ki_ki\barMONO47 ):
print path,(" : ~~ this dir s name is OK ~~" if path== 。join(regx_dirs。match(path)。group())
else " : ## this dir s name doesn t match ##")
The function compute_regexes() first splits the original pat_file regex pattern into elements aimed at matching names of directories in a path。
接着,它赞扬:
。
The treatment implying dgr and the function repl() is a sophistication that allows the function compute_regexes() to take account of the group s references (id est: special sequences 1 2 etc) and to change them to obtain pat_dirs with group s references still correct relatively to the added parentheses introduced to create pat_dirs。
该法典的内容:
IN FUNCTION compute_regexes() :
pat_file== J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)\w+。(dat|rtf|jpeg)
splitted_pat :
J:
f[ruv]?o+
w+
b[ae]r(d+)?
(?(1)TURI1d*|MONOd+)
w+。(dat|rtf|jpeg)
pat_parent_dir== J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)
dgr :
group(1) is in splitted_pat[3]
group(2) is in splitted_pat[4]
group(3) is in splitted_pat[5]
pat_dirs== J:(?=\|)(\f[ruv]?o+(?=\|)(\w+(?=\|)(\b[ae]r(d+)?(?=\|)(\(?(4)TURI4d*|MONOd+))?)?)?)?
EXAMPLES with regx_file :
pat_file== J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)\w+。(dat|rtf|jpeg)
J:foooasiler92TURI9258eru。rtf True
J:frooooooki_kiarMONO47madrid。jpeg True
EXAMPLES with regx_dirs :
J:fooo : ~~ this dir s name is OK ~~
J:foooasil : ~~ this dir s name is OK ~~
J:foooasiler92 : ~~ this dir s name is OK ~~
J:foooasiler92TURI777 : ## this dir s name doesn t match ##
J:foooasiler92TURI9258 : ~~ this dir s name is OK ~~
J:frooooooJ:frooooooki_ki : ## this dir s name doesn t match ##
J:frooooooki_kiar : ~~ this dir s name is OK ~~
J:frooooooki=kiar : ## this dir s name doesn t match ##
J:frooooooki_kiarMONO47 : ~~ this dir s name is OK ~~
。
。
3 )
Finally, here s the function
select_walk()
that does the job of searching for files in a tree whose names match a certain regex:
it yields the triples (dirpath, dirnames, filenames) returned by the built-in os。walk() function , but only those whose directory filenames contains correct file s names matching pat_file。
Of course, during the iteration, the function select_walk() doesn t explore the directories whose files content will never match the key regex pattern pat_file because of their (directories ) names。
def select_walk(pat_file,start_dir):
from os import sep
splitted_pat = re。split(r \\ if sep== \ else / , pat_file)
pat_parent_dir = (r \ if sep== \ else / )。join(splitted_pat[0:-1])
dgr = {}
for i,el in enumerate(splitted_pat):
if re。search( (。*?) ,el):
dgr[len(dgr)+1] = i
def repl(mat, dgr = dgr):
the = int(mat。group(1) if mat。group(1) else mat。group(2))
return str(the + dgr[the])
for i,el in enumerate(splitted_pat):
splitted_pat[i] = re。sub(r (?<=(?()(d+)(?=))|(?<=\)(d+) ,repl,el)
pat_dirs =
for x in splitted_pat[-2:0:-1]:
pat_dirs = r (?=\|)(\%s%s)? % (x,pat_dirs)
pat_dirs = splitted_pat[0] + pat_dirs
print pat_dirs== ,pat_dirs
regx_file = re。compile(pat_file)
regx_dirs = re。compile(pat_dirs)
regx_parent_dir = re。compile(pat_parent_dir)
start_dir = start_dir。rstrip(sep) + sep
print
start_dir == +start_dir
for dirpath,dirnames,filenames in os。walk(start_dir):
dirpath = dirpath。rstrip(sep)
print
。join(( explored dirpath : %s is_direct_parent: %s
% (dirpath,( NO , YES )[bool(regx_parent_dir。match(dirpath))]),
dirnames : %s % dirnames,
filenames : %s % filenames))
if regx_parent_dir。match(dirpath):
filenames[:] = [filename for filename in filenames
if regx_file。match(dirpath + sep + filename)]
dirnames[:] = []
print
。join(( dirnames : not to be explored ,
yielded filenames : %s
% filenames))
yield (dirpath,dirnames,filenames)
else:
dirnames[:] = [dirname for dirname in dirnames
if regx_dirs。match(dirpath + sep + dirname)。group()==dirpath + sep + dirname]
print
。join(( dirnames to explore : %s % dirnames,
filenames : not to be yielded
))
pat_file = r J:\f[ruv]?o+\w+\b[ae]r(d+)?\(?(1)TURI1d*|MONOd+)\w+。(dat|rtf|jpeg)
print
SELECTED (dirpath, dirnames, filenames) :
+
。join(map(repr, select_walk(pat_file, J:\ )))
结果
pat_dirs== J:(?=\|)(\f[ruv]?o+(?=\|)(\w+(?=\|)(\b[ae]r(d+)?(?=\|)(\(?(4)TURI4d*|MONOd+))?)?)?)?
start_dir == J:
explored dirpath : J: is_direct_parent: NO
dirnames : [ Amazon , faooo , Favorites , foo , fooo , froooo , Python , RECYCLER , System Volume Information ]
filenames : [ image00。pfm , rep。py ]
dirnames to explore : [ foo , fooo , froooo ]
filenames : not to be yielded
explored dirpath : J:foo is_direct_parent: NO
dirnames : [ basil , poto% , tamata ]
filenames : [ kalaomi。xls ]
dirnames to explore : [ basil , tamata ]
filenames : not to be yielded
explored dirpath : J:fooasil is_direct_parent: NO
dirnames : [ ber300 , ber89 ]
filenames : []
dirnames to explore : [ ber300 , ber89 ]
filenames : not to be yielded
explored dirpath : J:fooasiler300 is_direct_parent: NO
dirnames : []
filenames : []
dirnames to explore : []
filenames : not to be yielded
explored dirpath : J:fooasiler89 is_direct_parent: NO
dirnames : [ TURI1023 , TURI850 ]
filenames : []
dirnames to explore : []
filenames : not to be yielded
explored dirpath : J:foo amata is_direct_parent: NO
dirnames : [ vahine ]
filenames : []
dirnames to explore : []
filenames : not to be yielded
explored dirpath : J:fooo is_direct_parent: NO
dirnames : [ atlantis , plain , york# ]
filenames : []
dirnames to explore : [ atlantis , plain ]
filenames : not to be yielded
explored dirpath : J:foooatlantis is_direct_parent: NO
dirnames : [ atlABC , atlDEFG ]
filenames : []
dirnames to explore : []
filenames : not to be yielded
explored dirpath : J:foooplain is_direct_parent: NO
dirnames : [ bar999 , ws89rt , zx13ao ]
filenames : []
dirnames to explore : [ bar999 ]
filenames : not to be yielded
explored dirpath : J:foooplainar999 is_direct_parent: NO
dirnames : [ MONO2 , TURI2227 , TURI99905 ]
filenames : []
dirnames to explore : [ TURI99905 ]
filenames : not to be yielded
explored dirpath : J:foooplainar999TURI99905 is_direct_parent: YES
dirnames : [ AERIAL , minidisc ]
filenames : [ concrete。txt , galileo。jpeg , polynesia。dat ]
dirnames : not to be explored
yielded filenames : [ galileo。jpeg , polynesia。dat ]
explored dirpath : J:froooo is_direct_parent: NO
dirnames : [ another_dir , one_dir ]
filenames : []
dirnames to explore : [ another_dir , one_dir ]
filenames : not to be yielded
explored dirpath : J:frooooanother_dir is_direct_parent: NO
dirnames : [ notseen , notseen2 ]
filenames : []
dirnames to explore : []
filenames : not to be yielded
explored dirpath : J:frooooone_dir is_direct_parent: NO
dirnames : [ bar25 , ber ]
filenames : [ photo in one_dir。jpeg , tabula。xls ]
dirnames to explore : [ bar25 , ber ]
filenames : not to be yielded
explored dirpath : J:frooooone_dirar25 is_direct_parent: NO
dirnames : [ MONO8 , TURI2501 , TURI2502 , TURI4813 ]
filenames : []
dirnames to explore : [ TURI2501 , TURI2502 ]
filenames : not to be yielded
explored dirpath : J:frooooone_dirar25TURI2501 is_direct_parent: YES
dirnames : []
filenames : [ beretta。xls , italy。dat , matallelo。jpeg , turi2501_ser。rtf ]
dirnames : not to be explored
yielded filenames : [ italy。dat , matallelo。jpeg , turi2501_ser。rtf ]
explored dirpath : J:frooooone_dirar25TURI2502 is_direct_parent: YES
dirnames : []
filenames : [ adamante。jpeg , egyptic。txt , urubu。rtf ]
dirnames : not to be explored
yielded filenames : [ adamante。jpeg , urubu。rtf ]
explored dirpath : J:frooooone_direr is_direct_parent: NO
dirnames : [ MONO532 , TURI , TURI30 ]
filenames : []
dirnames to explore : [ MONO532 ]
filenames : not to be yielded
explored dirpath : J:frooooone_direrMONO532 is_direct_parent: YES
dirnames : []
filenames : [ bacillus。jpeg , blueberry。dat , Perfume。doc ]
dirnames : not to be explored
yielded filenames : [ bacillus。jpeg , blueberry。dat ]
SELECTED (dirpath, dirnames, filenames) :
( J:\fooo\plain\bar999\TURI99905 , [], [ galileo。jpeg , polynesia。dat ])
( J:\froooo\one_dir\bar25\TURI2501 , [], [ italy。dat , matallelo。jpeg , turi2501_ser。rtf ])
( J:\froooo\one_dir\bar25\TURI2502 , [], [ adamante。jpeg , urubu。rtf ])
( J:\froooo\one_dir\ber\MONO532 , [], [ bacillus。jpeg , blueberry。dat ])