English 中文(简体)
Specifying chars in python
原标题:

I need a functions that iterates over all the lines in the file.
Here s what I have so far:

def LineFeed(file):
    ret = ""
    for byte in file:
        ret = ret + str(byte)
        if str(byte) ==  
 :
            yield ret
            ret = ""

All the lines in the file end with (not ), and I m reading it in "rb" mode, (I have to read this file in binary). The yield doesn t work and returns nothing. Maybe there s a problem with the comparison? I m just not sure how you represent a byte/char in python.

I m getting the idea that if you for loop on a "rb" file it still tries to iterate over lines not bytes..., How can I iterate over bytes? My problem is that I don t have standard line endings. Also my file is filled with 0x00 bytes and I would like to get rid of them all, so I think I would need a second yield function, how could I implement that, I just don t know how to represent the 0x00 byte in python or the NULL char.

最佳回答

If you re in control of how you open the file, I d recommend opening it with universal newlines, since isn t recognized as a linefeed character if you just use rb mode, but it is if you use Urb .

This will only work if you aren t including as well as in your binary file somewhere, since the distinction between and is lost when using universal newlines.

Assuming you want your yielded lines to still be terminated:

NUL =  x00 
def lines_without_nulls(path):
    with open(path,  Urb ) as f:
        for line in f:
            yield line.replace(NUL,   ).replace( 
 ,  
 )
问题回答

I think that you are confused with what "for x in file" does. Assuming you got your handle like "file = open(file_name)", byte in this case will be an entire line, not a single character. So you are only calling yield when the entire line consists of a single carriage return. Try changing "byte" to "line" and iterating over that with a second loop.

Perhaps if you were to explain what this file represents, why it has lots of x00 , why you think you need to read it in binary mode, we could help you with your underlying problem.

Otherwise, try the following code; it avoids any dependence on (or interference from) your operating system s line-ending convention.

lines = open("the_file", "rb").read().split("
")
for line in lines:
    process(line)

Edit: the ASCII NUL (not "NULL") byte is "x00".

So, your problem is iterating over the lines of a file open in binary mode that use as a line separator. Since the file is in binary mode, you cannot use the universal newline feature, and it turns out that is not interpreted as a line separator in binary mode.

Reading a file char by char is a terribly inefficient thing to do in Python, but here s how you could iterate over your lines:

def cr_lines(the_file):
    line = []
    while True:
        byte = the_file.read(1)
        if not byte:
            break
        line.append(byte)
        if byte ==  
 :
            yield   .join(line)
            line = []
    if line:
        yield   .join(line)

To be more efficient, you would need to read bigger chunks of text and handle buffering in your iterator. Keeping in mind that you could get strange bugs if seeking while iterating. Preventing those bugs would require a subclass of file so you can purge the buffer on seek.

Note the use of the .join(line) idiom. Accumulating a string with += has terrible performance and is common mistake made by beginning programmers.

Edit:

  • string1 += string2 string concatenation is slow. Try joining a list of strings.

  • ddaa is right--You shouldn t need the struct package if the binary file only contains ASCII. Also, my generator returns the string after the final , before EOF. With these two minor fixes, my code is suspiciously similar (practically identical) to this more recent answer.

Code snip:

def LineFeed(f):
    ret = []
    while True:
        oneByte = f.read(1)
        if not oneByte: break
        # Return everything up to, but not including the carriage return
        if oneByte ==  
 :
            yield   .join(ret)
            ret = []
        else:
            ret.append(oneByte)
    if oneByte:
        yield   .join(ret)
if __name__ ==  __main__ :
    lf = LineFeed( open( filename , rb ) )

    for something in lf:
        doSomething(something)




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签