Some completely unscientific benchmarking first:
所有方案都按照默认的优化程度(-O3, gcc,-O2, GHC)编制,并运行。
time ./prog > outfile
As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.
- The "naive" Haskell programme (
forM [1 .. 10000000] $ j -> putStrLn (show j)
) took 8.64s, about 8.8MB/s.
- The same with
forM_
instead of forM
took 5.64s, about 13.5MB/s.
- The
ByteString
version from dflemstr s answer took 9.13s, about 8.3MB/s.
- The
Text
version from dflemstr s answer took 5.64s, about 13.5MB/s.
- The
Vector
version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000]
, where C
is Data.ByteString.Char8
, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000]
took 3.06s, about 24.8MB/s.
手册
main = putStr $ go 1
where
go :: Int -> String
go i
| i > 10000000 = ""
| otherwise = shows i . showChar
$ go (i+1)
2.32s,大约32.75MB/s。
main = putStrLn $ replicate 78888896 a
took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 a
where C
is Data.ByteString.Char8
, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteString
s.
What can we learn from that?
首先,除非你真的想要收集结果,否则不使用<条码>forM或。 业绩,这一陷阱。
然后,ByteString
输出可能非常快(10.),但如果在产出方面建造<代码>ByteString的工作进展缓慢(3.),则您的代码比“naive String
输出要慢。
3. 什么是可怕的? 所涉的<代码>标准代码>也非常简短。 因此,你收到一份名单。
Chunk "1234567" Empty
and between any two such, a Chunk "
" Empty
is put, then the resulting list is concatenated, which means all these Empty
s are tossed away when a ... (Chunk "1234567" (Chunk "
" (Chunk "1234568" (...))))
is built. That s a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text
and the fixed "naive" String
version can be achieved by pack
ing to strict ByteString
s and using fromChunks
(and Data.List.intersperse
for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the String
s, using k -> shows k "
"
instead of show
, the concatenation has to deal with half as many slightly longer ByteString
s, which pays off.
我不熟悉文本或病媒的内部,无法就所观察到的履约原因提供超过半教育的猜测,因此,我就离开了。 只需说,与固定的缩略语<代码>String/code>版本相比,业绩收益微不足道。
Now, 6. shows that ByteString
output is faster than String
output, enough that in this case the additional work of pack
ing is more than compensated. However, don t be fooled by that to believe that is always so. If the String
s to pack are long, the packing can take more time than the String
output.
But ten million invocations of putStrLn
, be it the String
or the ByteString
version, take a lot of time. It s faster to grab the stdout
Handle
just once and construct the output String
in non-IO code. unlines
already does well, but we still suffer from the construction of the list map show [1 .. 10^7]
. Unfortunately, the compiler didn t manage to eliminate that (but it eliminated [1 .. 10^7]
, that s already pretty good). So let s do it ourselves, leading to 8. That s not too terrible, but still takes more than twice as long as the C programme.
可以通过低级和直接填满<代码>,加快哈萨克尔方案。 未通过<代码>String通过<代码>show实现的,但我不知道C速度是否能够达到。 不管怎么说,这一低水平的法典太高,因此,我把我所掌握的东西推向你们,但有时,如果速度快,我不得不拿一手 d。