Question

question

我希望有一个这样的方案。

1
...
10000000

页: 1 什么是最简单的法典,什么是文字的,什么是体面的表现? 我的直言是,存在一些缺乏活力的问题。我的C代码为100兆赫/秒,而通过参考,则使用d。 3 GB/s(对不准确表示担忧,见评论——我对大图片订单更感兴趣。

人们会认为,到现在,这将是一个解决的问题,即任何现代汇编者都能够立即写出能够合理运行的方案。

C code

#include <stdio.h>

int main(int argc, char **argv) {
    int len = 10000000;
    for (int a = 1; a <= len; a++) {
        printf ("%d
", a);
    }
    return 0;
}

I m compiling with clang -O3. A performance skeleton which calls putchar( ) 8 times gets comparable performance.

Haskell code

a naiive Haskell 执行项目在13个MiB/sec进行,汇编如下:ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -fun Box-strict-fields。 (我没有把我的图书馆改成-fllvm,或许我需要这样做) 法典:

import Control.Monad
main = forM [1..10000000 :: Int] $ j -> putStrLn (show j)

我与Haskell的表象最好更慢,在17米博拉。问题在于,我找不到将<代码>Vectors转换成ByteStrings的好办法(如果存在使用斜体的解决办法?)

import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))

writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v

main = writeVector (V.generate 10000000 id)

www.un.org/spanish/ecosoc 看来,如本法典所示,写作<代码>ByteStrings迅速。

import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000  
 )

This gets 1.3 GB/s, which isn t as fast as dd, but obviously much better.

Answer 1

Some completely unscientific benchmarking first:

所有方案都按照默认的优化程度(-O3, gcc,-O2, GHC)编制,并运行。

time ./prog > outfile

As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.

The "naive" Haskell programme (forM [1 .. 10000000] $ j -> putStrLn (show j)) took 8.64s, about 8.8MB/s.
The same with forM_ instead of forM took 5.64s, about 13.5MB/s.
The ByteString version from dflemstr s answer took 9.13s, about 8.3MB/s.
The Text version from dflemstr s answer took 5.64s, about 13.5MB/s.
The Vector version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000], where C is Data.ByteString.Char8, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB/s.

手册

main = putStr $ go 1
  where
    go :: Int -> String
    go i
        | i > 10000000 = ""
        | otherwise = shows i . showChar  
  $ go (i+1)

2.32s,大约32.75MB/s。

main = putStrLn $ replicate 78888896 a took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 a where C is Data.ByteString.Char8, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteStrings.

What can we learn from that?

首先,除非你真的想要收集结果,否则不使用<条码>forM或。业绩,这一陷阱。

然后,ByteString输出可能非常快(10.),但如果在产出方面建造<代码>ByteString的工作进展缓慢(3.),则您的代码比“naive String输出要慢。

3. 什么是可怕的? 所涉的<代码>标准也非常简短。因此,你收到一份名单。

Chunk "1234567" Empty

and between any two such, a Chunk " " Empty is put, then the resulting list is concatenated, which means all these Emptys are tossed away when a ... (Chunk "1234567" (Chunk " " (Chunk "1234568" (...)))) is built. That s a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text and the fixed "naive" String version can be achieved by packing to strict ByteStrings and using fromChunks (and Data.List.intersperse for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the Strings, using k -> shows k " " instead of show, the concatenation has to deal with half as many slightly longer ByteStrings, which pays off.

我不熟悉文本或病媒的内部,无法就所观察到的履约原因提供超过半教育的猜测,因此,我就离开了。只需说,与固定的缩略语<代码>String/code>版本相比,业绩收益微不足道。

Now, 6. shows that ByteString output is faster than String output, enough that in this case the additional work of packing is more than compensated. However, don t be fooled by that to believe that is always so. If the Strings to pack are long, the packing can take more time than the String output.

But ten million invocations of putStrLn, be it the String or the ByteString version, take a lot of time. It s faster to grab the stdout Handle just once and construct the output String in non-IO code. unlines already does well, but we still suffer from the construction of the list map show [1 .. 10^7]. Unfortunately, the compiler didn t manage to eliminate that (but it eliminated [1 .. 10^7], that s already pretty good). So let s do it ourselves, leading to 8. That s not too terrible, but still takes more than twice as long as the C programme.

可以通过低级和直接填满<代码>,加快哈萨克尔方案。未通过<代码>String通过<代码>show实现的,但我不知道C速度是否能够达到。不管怎么说,这一低水平的法典太高,因此,我把我所掌握的东西推向你们,但有时,如果速度快,我不得不拿一手 d。

Answer 2

采用 la星座使你有一些缓冲,因为插手将立即书写,而更多的数字只能按需要制作。这部法典体现了基本的想法(或许可以做一些优化):

import qualified Data.ByteString.Lazy.Char8 as ByteString

main =
  ByteString.putStrLn .
  ByteString.intercalate (ByteString.singleton  
 ) .
  map (ByteString.pack . show) $
  ([1..10000000] :: [Int])

我仍在使用<代码>String,用于这里的数字,从而导致明显放缓。如果我们转向text的图书馆,而不是>>>>>>>>,通过测试< <>/code><><>>><>>>>>>/code><>> 图书馆,我们可以进入“本土”展示帐篷的功能,并能够这样做:

import Data.Monoid
import Data.List
import Data.Text.Lazy.IO as Text
import Data.Text.Lazy.Builder as Text
import Data.Text.Lazy.Builder.Int as Text

main :: IO ()
main =
  Text.putStrLn .
  Text.toLazyText .
  mconcat .
  intersperse (Text.singleton  
 ) .
  map Text.decimal $
  ([1..10000000] :: [Int])

我不知道你是如何衡量这些方案的“快速”的(有<条码>pv工具?),但我想象,其中一项程序将是你能够得到的最快的三角方案。

Answer 3

如果要达到最高业绩,那就有助于采取整体观点;即,你想写出一份从<代码>[Int]到一系列系统,要求把记忆库写到档案中。

Lazy bytestrings are good representation for a sequence of chunks of memory. Mapping a lazy bytestring to a series of systems calls that write chunks of memory is what L.hPut is doing (assuming an import qualified Data.ByteString.Lazy as L). Hence, we just need a means to efficiently construct the corresponding lazy bytestring. This is what lazy bytestring builders are good at. With the new bytestring builder (here is the API documentation), the following code does the job.

import qualified Data.ByteString.Lazy          as L
import           Data.ByteString.Lazy.Builder       (toLazyByteString, charUtf8)
import           Data.ByteString.Lazy.Builder.ASCII (intDec)
import           Data.Foldable                      (foldMap)
import           Data.Monoid                        (mappend)
import           System.IO                          (openFile, IOMode(..))

main :: IO ()
main = do
    h <- openFile "/dev/null" WriteMode
    L.hPut h $ toLazyByteString $
        foldMap ((charUtf8  
  `mappend`) . intDec) [1..10000000]

注:I输出到<代码>/dev/null,以避免磁盘司机的干扰。将数据移至监督厅的工作依然相同。在我的机器上,上述代码为0.45秒,比原法典的5.4秒快12倍。这就意味着有168个甲基溴当量。我们可以使用约束编码,再压缩30%的速度(220兆赫/秒)。

import qualified Data.ByteString.Lazy.Builder.BasicEncoding as E

L.hPut h $ toLazyByteString $
    E.encodeListWithB 
        ((x -> (x,  
 )) E.>$< E.intDec `E.pairB` E.charUtf8) 
        [1..10000000]

他们的辛勤看着一种比值,因为/BoundedEncoding a具体规定,将Haskell数值的类型a转换为受约束的宽度序列,从而使受约束者可以在汇编上输入。这使得诸如<代码>E.encodeListWithB等职能得以实施某些额外优化,以便实际填补缓冲。见<编码>文件。数据.ByteString.Lazy.Builder.BasicEncoding in the abovelink to the AP documentation (syw, enupid supralink limit for newuser) for more information.

这里是来源所有基准。

结论是,只要我们理解我们执行的成本模式,并利用正确的数据结构,我们就能够从一项申报性解决办法中获得非常好的业绩。每当构造一套包装的数值(例如,作为试样的提纲)时,使用正确的数据结构就是一种测试。

question

C code

Haskell code

Some completely unscientific benchmarking first:

What can we learn from that?

友情链接