English 中文(简体)
Fast file integrity verification
原标题:

Is there any fast algorithm that allows to compare two files (for verification purpose) without need to read the entire contents?

最佳回答

You could use a MD5 hash on both files and compare them that way. However it does technically read the whole file. You won t be able to have 100% certainty without checking I don t think.

In C# one would do this in the following way (sorry, you didn t mention a specific language):

protected string GetMD5HashFromFile(string fileName)
{
    byte[] retVal = { };

    using (FileStream file = new FileStream(fileName, FileMode.Open))
    using (MD5 md5 = new MD5CryptoServiceProvider())
    {
        retVal = md5.ComputeHash(file);
    }

    if (retVal.Length > 0)
    {
        StringBuilder sb = new StringBuilder();

        for (int i = 0; i < retVal.Length; i++)
        {
            sb.Append(retVal[i].ToString("x2"));
        }

        return sb.ToString();
    }
    else
    {
        return string.Empty;
    }
}

bool CompareFiles(string fileName1, string fileName2)
{
    return (GetMD5HashFromFile(fileName1) == GetMD5HashFromFile(fileName2));
}
问题回答

You could write a custom CRC procedure that reads bits of the file. e.g. 16 bytes for every 1k or something like that instead of CRCing the whole file. It s riskier, of course, since data could possibly change where you re not looking and not have an effect on your compared blocks. But CRC is a bit risky too since two very different data sets could return the same value.

There is no algorithm to be 100% sure the files are the same unless you read every byte. The proof is simple - assume such an algorithm exists, and we use it to compare two files. That implies some number of bytes are not read by the algorithm. I can cause the algorithm to fail by changing those bytes in one file but not the other.

It is mathematically impossible to determine that two files of the same size are equal without reading both of them entirely, but it is very possible to determine that they are not equal without reading both entirely. This can be done in various ways, such as using hash functions or short circuit comparison.

I m afraid you cannot avoid a full read of both files to be completely sure they re equals.

You can first check both file s size; if they re different, the files are different (but what about text files which would only differ on the line separator ?).

If size is the same, I don t see any correct way to go further but starting to read both files. Of course it can stop as soon as buffer differs, but it could only state that files are really equals when last character as been processed.





相关问题
Find the closest time from a list of times

So, here s the scenario. I have a file with a created time, and I want to choose a time from a list of times that that file s created time is closest or equal too...what would be the best way to ...

PHP Comparison Operators and Data Types

I m currently working through O Reilly s "Programming PHP" and have come across this table titled "Type of comparison performed by the comparison operators": First Operand | Second ...

Loop Until Condition Reached, iPhone

I have a problem here... After a button is pressed I want a loop to run until a condition is reached: - (IBAction)buttonclick1 ... if ((value2ForIf - valueForIf) >= 3) { ... I would like a loop ...

nuSoap or Zend Soap? [closed]

I would like to know the differences between nusoap and ZendSoap, which is the best? what benefits and disadvantages of each? Anyone who has used both technologies could make this comparison? Thank ...

Logic for a file compare

I trying to write a programm for file compare. For example: file1 1 2 3 4 5 file2 1 2 @ 3 4 5 If I do it line by line, I get: 1 == 1; 2 == 2; 3 != @; 4 != 3; 5 != 4; != 5; But, the truth is ...

热门标签