English 中文(简体)
PowerShell的双轨档案比较速度
原标题:Speed of binary file comparisons in PowerShell

在互联网上就如何比较PowerShell的档案进行了大量讨论。 例如:

然而,我所发现的任何东西都没有讨论比较的不同方法的速度差异。

(Kes Bakker及其FilesAreEqual。 第1条的标题称其速度较快,但并没有说得快,也没有提供任何数据支持索赔。 在我下面的回答中,职能编号从他处改编,你认为我的数据符合他的要求。

这个问题是我的提问的后续行动:。 权力 壳牌: 为什么这些档案比较时间如此不同?。 在进行更多研究时,我汇编了有关在PowerShell比较双向档案的各种方法速度的数据。 我把这个问题放在桌面上,以便回答这些数据。 因此,问题是:

比较PowerShell双向档案的方法有多快?

问题回答

下表按7种不同方法列出了对4份档案及其同份数据进行对比的速度。 选定这4份档案是为了方便地放在我计算机的一条路上,以及位于外部SD的一条相应的路上。 它们都属于光彩的录像档案,不应与之相关,而是其结构的基点:,其结果是在其中一种方法上产生了有意义的效果。 表格显示了Mb/sec对每种方法和每个档案的比较过程的速度。 计算速度时,将案卷的大小区分在程序结束的时间。 下文将进一步介绍进行和时间比较的文字。

表一栏是:

www.un.org/Depts/DGACM/index_spanish.htm 成果: 在所有情况下,缓冲方法最快。

由于检测的奶制品完全相同,因此所有测量结果必须比较档案中的所有 by。 对于不一定完全相同的奶制品,Windows的指挥和缓冲方法能够在发现差异后进行消.,这样会更快地运行。 <代码>compare-object 方法比较了整个档案,即使第一种书目有所不同。

Size PS Comp FC Compare-object Compare raw Compare as byte raw Compare as byte read 0 Buffered
74 - 29.0 30.3
"" 5 29.2 30.2 4.1 18.7 0.5 0.5 35.2
"" 7 29.2 30.9 3.4 20.7 1.2 0.9 36.5
66 - 25.3 26.2
"" 5 25.5 26.1 5.6 20.4 0.5 0.5 35.4
"" 7 25.4 26.3 2.8 22.0 1.2 1.0 37.1
162 - 25.6 26.1
"" 5 25.5 26.5 15.0 18.7 0.5 Error 35.8
"" 7 25.8 26.8 17.8 24.6 1.2 1.0 36.8
56 - 25.5 25.8
"" 5 25.5 26.0 21.6 3.0 0.5 0.5 35.2
"" 7 26.0 26.5 17.6 25.1 1.3 1.1 36.0

Table: Speed, in Mb/sec, of comparing four identical pairs of files (identified by their size in Mb) by seven methods running in Windows batch, in Windows PowerShell 5.1, and in PowerShell 7.

请注意,用“Compare-object”方法,第三和第四卷的运行速度远远超过前两卷。 这是my original question询问的,并在答复中加以解释。

www.un.org/Depts/DGACM/index_spanish.htm 错误和错误

在最大档案中显示为“Error”的案例中(在最大卷宗中的PS 5中,“Compare as byte”改为0),该过程用电击了电荷,“有点: 超过支助范围

正如我前面指出的:else where,“compareant”方法与, when presented with a pair of file of 3.7 Gb.

警告: 在初步测试中,结果似乎表明,Windows 指挥系统FC比缓冲方法快七倍左右。 我已经用缓冲方法对其背书进行了1个Tb 夹的比较,大约需要10个小时的时间。 摘录 NC 可以更快地工作,因此,我改写我的文字,重复这一比较,并混淆起来,发现它需要14个小时。 然后我认识到,最初的结果在我与<代码>comp/code”进行比较时,由Windows对档案进行ach弄,因此,在用<代码>再次做时,其速度要快得多。 NC。 在上文报告的结果中,测量工作是以空洞的海滩进行的。 页: 1 找不到从Cache上删除档案的途径,因此每次测量都是在重新配置计算机(而且没有任何其他操作)之后立即进行的。

Environment

对AMD Ryzen 7 Pro 6850H Processor, RAM32 Gb, 运行Windows 11 Pro 64进行了测量。 每个楼梯的档案都存放在内部的SSD和外部的USB SSD。

Code

我很想得到关于改进这些文字的反馈意见,有两项建议。 首先,我知道打字片是粗略的;刚刚迅速布告,以完成这项工作。 更多地注意PowerShell 书的设计。 在这方面,我知道,我的编码风格是非常规的,但我多年来一直在发展这一风格,如果你不喜欢,我只能道歉。 但是,如果你想办法改进文字的功能,请说什么。

她还有兴趣了解其他人是否掌握了文字并取得了与地雷或不同的结果。

<代码>comp和FC:

rem Script: "measure speed - comp.bat"
rem Measure the time taken to compare two files using "comp" running in a Windows batch script.
rem To ensure that none of the files is in cache, run this immediately after booting the computer.

time < nul
comp /m "<path 1><file 1>" "<path 2><file 1>"
time < nul
comp /m "<path 1><file 2>" "<path 2><file 2>"
time < nul
comp /m "<path 1><file 3>" "<path 2><file 3>"
time < nul
comp /m "<path 1><file 4>" "<path 2><file 4>"
time < nul

The console output was copy pasted into Excel, which then subtracted the times to get the elapsed time of each process. The batch for FC was the same with comp /m replaced with FC /b.

PowerShell script, including function bFilesCompareBinary:

# measure-speed-of-file-comparisons.ps1

# Set the $sFolder_n to a pair of folders with identical content. This script will measure and record, 
#     by one of eight different methods, the time taken to verify that all the files are identical.
# To ensure that none of the files is in cache, run this immediately after booting the computer.

# On use of get-content parameters "-encoding byte", "-AsByteStream", "-raw", and "-ReadCount 0":
#     www.jonathanmedd.net/2017/12/powershell-core-does-not-have-encoding-byte.-replaced-with-new-parameter-asbytestream.html/
#     www.powershellmagazine.com/2014/03/17/pstip-reading-file-content-as-a-byte-array/
#     www.github.com/PowerShell/PowerShell/issues/11266
#     www.github.com/MicrosoftDocs/PowerShell-Docs/issues/3215

# Calls to get-content with as-byte paremters are wrapped in an array ("@(, )") per instructions in
#     www.stackoverflow.com/questions/76842081/powershell-why-is-this-timing-not-working/#76843506

# =========================================================================
# Manually set these paths before running:
# =========================================================================
$sFolder_1 = "<path to first folder, including final   >"
$sFolder_2 = "<path to second folder, including final   >"
$sOutputFilespec = "<filespec of output csv file>"

# =========================================================================
# Function bFilesCompareBinary()
# =========================================================================
function bFilesCompareBinary ([System.IO.FileInfo] $oFile_1, [System.IO.FileInfo] $oFile_2, `
                              [uint32] $nBufferSize = 524288, $sRetIfSame = "Same", $sRetIfDif = "Dif")
   {# Return message for whether two given files are identical by binary comparison, or error description.
    #    Assumes the files are the same size, else error.
    
    # From "www.stackoverflow.com/questions/19990788/powershell-binary-file-comparison#22800663"
    #    But comment by @mclayton on "www.stackoverflow.com/questions/76842081/powershell-why-is-this-timing-not-working/#76843506"
    #        warns that .read() does not always get all the bytes requested, so I ve added a test for that.
    # FileInfo Class:   "https://learn.microsoft.com/en-us/dotnet/api/system.io.fileinfo"
    # FileStream Class: "https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream"

    if ($nBufferSize -eq 0) {$nBufferSize = 524288}

    try{$oStream_1 = $oFile_1.OpenRead()
        $oStream_2 = $oFile_2.OpenRead()

        $oBuffer_1 = New-Object byte[] $nBufferSize
        $oBuffer_2 = New-Object byte[] $nBufferSize

        if ($oFile_1.Length -ne $oFile_2.Length) {throw "Files are different sizes: $oFile_1.Length , $oFile_2.Length"}
        $nBytesLeft = $oFile_1.Length
        $bDifferenceFound = $false
        $sError = ""

        do {$nBytesToGet = [math]::Min($nBytesLeft, $nBufferSize)
            $nBytesRead_1 = $oStream_1.read($oBuffer_1, 0, $nBytesToGet)
            $nBytesRead_2 = $oStream_2.read($oBuffer_2, 0, $nBytesToGet)
            if ($nBytesRead_1 -ne $nBytesRead_2) {throw "Different byte count each file: $nBytesRead_1 , $nBytesRead_2"}
            if ($nBytesRead_1 -ne $nBytesToGet) {throw "Byte count different from requested: $nBytesRead_1 , $nBytesToGet"}
            $nBytesLeft -= $nBytesRead_1
            if (-not [System.Linq.Enumerable]::SequenceEqual($oBuffer_1, $oBuffer_2)) {$bDifferenceFound = $true}
            } while ((-not $bDifferenceFound) -and $nBytesLeft -gt 0)
        }

    catch {$sError = "Error: $_"}

    finally {$oStream_1.Close() ; $oStream_2.Close()}

    if ($sError -ne "") {return $sError}
      elseif ($bDifferenceFound) {return $sRetIfDif}
      else {return ($sRetIfSame)}
    }

# =========================================================================
# User interaction
# =========================================================================
$bBooted = (read-host ("Did you boot the computer immediately before running this? (Enter ""Y"" or ""N"".)")).ToUpper()
$sPSenv = (read-host ("PowerShell environment: Enter ""D"" if running directly in Windows or ""S"" if in scripting environment (ISE or VS Code)")).ToUpper()
$nMethod = read-host ("Comparison method: Enter 1 for comp, 2 for FC, 3 for compare-object, 4 for compare raw, " + `
                            # "5 for compare as byte, " + `
                            "6 for compare as byte raw, 7 for compare as byte read 0, or 8 for buffered")
switch ($nMethod) {1 {$sMethod = "comp"}                   2 {$sMethod = "FC"}
                   3 {$sMethod = "compare-object"}         4 {$sMethod = "compare raw"}
                   5 {$sMethod = "compare as byte"}        6 {$sMethod = "compare as byte raw"}
                   7 {$sMethod = "compare as byte read 0"} 8 {$sMethod = "buffered"}}

# =========================================================================
# Scan the folders and compare files.
# =========================================================================
$nLen_1 = $sFolder_1.Length
$PSversion = $PSVersionTable.PSVersion.Major
get-ChildItem -path $sFolder_1 -Recurse | ForEach-Object `
   {$oItem_1 = $_
    $sItem_1 = $oItem_1.FullName

    # If it s a file, compare in both folders:
    if (Test-Path -Type Leaf $sItem_1) `
       {$nSize_1   = $oItem_1.Length
        $sItem_rel = $sItem_1.Substring($nLen_1)
        $sItem_2   = join-path $sFolder_2 $sItem_rel
        $oItem_2   = get-item $sItem_2
        $LastExitCode = 99
        $nMid = ""
        write-output "Check $sItem_rel"
        $dStart = $(get-date)
        switch ($nMethod)
           {{$_ -in 1, 2}
                {switch ($nMethod)
                   {1 {comp /m "$sItem_1" "$sItem_2"}
                    2 {FC.exe /b "$sItem_1" "$sItem_2"}}
                 switch ($LastExitCode) {0 {$sResult = "Same"} 1 {$sResult = "Dif"} default {$sResult = "Error: $LastExitCode"}}}
            {$_ -in 3, 4, 5, 6, 7}
                {switch ($nMethod)
                   {3 {$oContent_1 = (get-content $sItem_1)
                       $oContent_2 = (get-content $sItem_2)}
                    4 {$oContent_1 = (get-content $sItem_1 -raw)
                       $oContent_2 = (get-content $sItem_2 -raw)}
                    {$_ -in 5, 6, 7}
                        {switch ($PSversion)
                            {5 {switch ($nMethod)
                                   {5 {$oContent_1 = @(, (get-content $sItem_1 -encoding byte))
                                       $oContent_2 = @(, (get-content $sItem_2 -encoding byte))}
                                    6 {$oContent_1 = @(, (get-content $sItem_1 -encoding byte -raw))
                                       $oContent_2 = @(, (get-content $sItem_2 -encoding byte -raw))}
                                    7 {$oContent_1 = @(, (get-content $sItem_1 -encoding byte -ReadCount 0))
                                       $oContent_2 = @(, (get-content $sItem_2 -encoding byte -ReadCount 0))}
                                }   }
                             7 {switch ($nMethod)
                                   {5 {$oContent_1 = @(, (get-content $sItem_1 -AsByteStream))
                                       $oContent_2 = @(, (get-content $sItem_2 -AsByteStream))}
                                    6 {$oContent_1 = @(, (get-content $sItem_1 -AsByteStream -raw))
                                       $oContent_2 = @(, (get-content $sItem_2 -AsByteStream -raw))}
                                    7 {$oContent_1 = @(, (get-content $sItem_1 -AsByteStream -ReadCount 0))
                                       $oContent_2 = @(, (get-content $sItem_2 -AsByteStream -ReadCount 0))}
                                }   }
                             default {$sResult = "Error: PowerShell version is $PSversion"}
                    }    }   }
                 $nMid = ($(get-date) - $dStart).Ticks / 1e7
                 if (compare-object $oContent_1 $oContent_2) `
                    {$sResult = "Dif"} else {$sResult = "Same"}}
            8 {$sResult = bFilesCompareBinary $oItem_1 $oItem_2}
            }
        $nElapsed = ($(get-date) - $dStart).Ticks / 1e7
        $oOutput = [PSCustomObject]@{Booted = $bBooted ; PSversion = $PSversion ; PSenv = $sPSenv ; Method = $sMethod    ; Item = $nItem         ; Result = $sResult
                                     Size = $nSize_1   ; tStart = $dStart       ; tMid = $nMid    ; tElapsed = $nElapsed ; Filespec = $sItem_rel}
        Export-Csv -InputObject $oOutput -Path $sOutputFilespec -Append -NoTypeInformation
    }   }

# =========================================================================
# End of script
# =========================================================================




相关问题
What to look for in performance analyzer in VS 2008

What to look for in performance analyzer in VS 2008 I am using VS Team system and got the performance wizard and reports going. What benchmarks/process do I use? There is a lot of stuff in the ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

How to speed up Visual Studio 2008? Add more resources?

I m using Visual Studio 2008 (with the latest service pack) I also have ReSharper 4.5 installed. ReSharper Code analysis/ scan is turned off. OS: Windows 7 Enterprise Edition It takes me a long time ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

热门标签