Question

我需要用一个庞大的档案,即一个单一、漫长的字眼。我知道按行程通过档案线反复使用的方法,但是由于单行结构,这些方法不适用于我。

任何替代办法?

Answer 1

这确实取决于您对word的定义。但试图这样做:

f = file("your-filename-here").read()
for word in f.split():
    # do something with word
    print word

这将使用白天性作为字体。

当然,这只是一个迅速的例子。

Answer 2

长期线? 我假定这个界线太大,无法合理地保存记忆,因此,你需要某种缓冲。

首先,这是一种坏的形式;如果你对档案有任何形式的控制,每行一个字。

如果不是的话,使用这样的东西:

line =   
while True:
    word, space, line = line.partition(   )
    if space:
        # A word was found
        yield word
    else:
        # A word was not found; read a chunk of data from file
        next_chunk = input_file.read(1000)
        if next_chunk:
            # Add the chunk to our line
            line = word + next_chunk
        else:
            # No more data; yield the last word and return
            yield word.rstrip( 
 )
            return

Answer 3

You really should consider using Generator

def word_gen(file):
    for line in file:
        for word in line.split():
            yield word

with open( somefile ) as f:
    word_gen(f)

Answer 4

这样做有更有效的方法,但很谨慎,这可能是最短的:

 words = open( myfile ).read().split()

如果记忆令人关切,那么你就不想这样做,因为它将把整个东西装上记忆,而不是把它.。

Answer 5

I ve answered a similar question before, but I have refined the method used in that answer and here is the updated version (copied from a recent answer):

Here is my totally functional approach which avoids having to read and split lines. It makes use of the itertools module:

Note for python 3, replace `itertools.imap` with `map`

import itertools

def readwords(mfile):
    byte_stream = itertools.groupby(
      itertools.takewhile(lambda c: bool(c),
          itertools.imap(mfile.read,
              itertools.repeat(1))), str.isspace)

    return ("".join(group) for pred, group in byte_stream if not pred)

样本使用:

>>> import sys
>>> for w in readwords(sys.stdin):
...     print (w)
... 
I really love this new method of reading words in python
I
really
love
this
new
method
of
reading
words
in
python
           
It s soo very Functional!
It s
soo
very
Functional!
>>>

I guess in your case, this would be the way to use the function:

with open( words.txt ,  r ) as f:
    for word in readwords(f):
        print(word)

Answer 6

读到正常轨道,然后将其分成白色空间,将其打成字?

类似:

word_list = loaded_string.split()

Answer 7

阅读后,你可以做到:

l = len(pattern)
i = 0
while True:
    i = str.find(pattern, i)
    if i == -1:
        break
    print str[i:i+l] # or do whatever
    i += l

页: 1

Answer 8

内容提要 Miner建议看好。简单简短。我在一段时间前写成的法典中使用了以下文字:

l = []
f = open("filename.txt", "rU")
for line in f:
    for word in line.split()
        l.append(word)

更长的版本Donald Miner建议的内容。

Note for python 3, replace itertools.imap with map

友情链接

Note for python 3, replace `itertools.imap` with `map`