Question

我试图将一个简单的(但相当大的)树木结构用于使用Haskell的双向档案。结构就是这样:

-- For simplicity assume each Node has only 4 childs
data Tree = Node [Tree] | Leaf [Int]

And here is how I need the data look on disk:

Each node starts with four 32-bit offsets to it s children, then follow the childs.
I don t care much about the leafs, let s say it s just n consecutive 32-bit numbers.
For practival purposes I would need some node labels or some other additional data but right now I don t care about that much neither.

对我来说,Haskellers在撰写双亲档案时首先选择的是数据。原文。图书馆。但就这一点而言,我在第1号子弹中存在问题。尤其是,当我即将写一个名字到档案中时,要把孩子写上字,我需要知道我目前被抵消的情况以及每个孩子的规模。

这不是数据。原文。简言之,我认为这必须完美地适用于摩纳哥变革者。但是,尽管它使冷却和功能健全,但迄今为止,我没有成功采用这种办法。

我询问了我认为有助于我解决问题的另外两个问题:here和here 。我必须说,每次我收到非常出色的答复,帮助我进一步取得进展,但不幸的是,我仍然无法解决整个问题。

Here是我迄今所去的东西,它仍然泄露了太多的实际记忆。

我希望能找到使用这种功能方法的解决办法,但也会感谢任何其他解决办法。

Answer 1

这里是执行sclv提出的两个通行证解决办法。

import qualified Data.ByteString.Lazy as L
import Data.Binary.Put
import Data.Word
import Data.List (foldl )

data Tree = Node [Tree] | Leaf [Word32] deriving Show

makeTree 0 = Leaf $ replicate 100 0xdeadbeef
makeTree n = Node $ replicate 4 $ makeTree $ n-1

SizeTree mimics original Tree, it does not contain data but at each node it stores size of corresponding child in Tree.
We need to have SizeTree in memory, so it worth to make it more compact (e.g. replace Ints with uboxed words).

data SizeTree
  = SNode {sz :: Int, chld :: [SizeTree]}
  | SLeaf {sz :: Int}
  deriving Show

With SizeTree in memory it is possible to serialize original Tree in streaming fashion.

putTree :: Tree -> SizeTree -> Put
putTree (Node xs) (SNode _ ys) = do
  putWord8 $ fromIntegral $ length xs          -- number of children
  mapM_ (putWord32be . fromIntegral . sz) ys   -- sizes of children
  sequence_ [putTree x y | (x,y) <- zip xs ys] -- children data
putTree (Leaf xs) _ = do
  putWord8 0                                   -- zero means  leaf 
  putWord32be $ fromIntegral $ length xs       -- data length
  mapM_ putWord32be xs                         -- leaf data


mkSizeTree :: Tree -> SizeTree
mkSizeTree (Leaf xs) = SLeaf (1 + 4 + 4 * length xs)
mkSizeTree (Node xs) = SNode (1 + 4 * length xs + sum  (map sz ys)) ys
  where
    ys = map mkSizeTree xs
    sum  = foldl  (+) 0

It is important to prevent GHC from merging two passes into one (in which case it will hold tree in memory). Here it is done by feeding not tree but tree generator to the function.

serialize mkTree size = runPut $ putTree (mkTree size) treeSize
  where
    treeSize = mkSizeTree $ mkTree size

main = L.writeFile "dump.bin" $ serialize makeTree 10

Answer 2

我将考虑两种基本办法。如果整个序列化结构易于记忆,你可以将每一节点的序号按期进行测试,并公正地使用每一段的长度来计算目前职位所抵消的数额。

serializeTree (Leaf nums)  = runPut (mapM_ putInt32 nums)
serializeTree (Node subtrees) = mconcat $ header : childBs
 where
  childBs = map serializeTree subtrees
  offsets = scanl (acc bs -> acc+L.length bs) (fromIntegral $ 2*length subtrees) childBs
  header = runPut (mapM_ putInt32 $ init offsets)

另一种选择是,在编集节点后,用适当数据回去并重新修补被抵消的田地。如果树木大,这可能是唯一的选择,但我不了解支持这一选择的序列化图书馆。这将涉及在以下地点开展工作:<代码>IO和>(见k)。

Answer 3

我认为,你想要的是两个明确的解决方案。第一个树木将你的树木转化为一个大面积附加说明的树木。这把树推到树上,但事实上却没有任何 mo子机械。第二张通行证位于平原旧的Put monad,鉴于面积说明已经计算,应当非常简单。

Answer 4

此处使用,这是“binary”一揽子计划的一部分。我的口号是适当的,但据“站起来”称,它立即拨款108兆字节,然后向其他执行方倾斜。

请注意,我没有试图读回数据,因此我的规模可能出现 errors错,抵消计算结果。

-- Paste this into TreeBinary.hs, and compile with
--    ghc -O2 --make TreeBinary.hs -o TreeBinary

module Main where


import qualified Data.ByteString.Lazy as BL
import qualified Data.Binary.Builder as B

import Data.List (init)
import Data.Monoid
import Data.Word


-- -------------------------------------------------------------------
-- Test data.

data Tree = Node [Tree] | Leaf [Word32] deriving Show

-- Approximate size in memory (ignoring laziness) I think is:
-- 101 * 4^9 * sizeof(Int) + 1/3 * 4^9 * sizeof(Node)

-- This version uses [Word32] instead of [Int] to avoid having to write
-- a builder for Int.  This is an example of lazy programming instead
-- of lazy evaluation. 

makeTree :: Tree
makeTree = makeTree1 9
  where makeTree1 0 = Leaf [0..100]
        makeTree1 n = Node [ makeTree1 $ n - 1
                           , makeTree1 $ n - 1
                           , makeTree1 $ n - 1
                           , makeTree1 $ n - 1 ]

-- --------------------------------------------------------------------
-- The actual serialisation code.


-- | Given a tree, return a builder for it and its estimated length in bytes.
serialiseTree :: Tree -> (B.Builder, Word32)
serialiseTree (Leaf ns) = (mconcat (B.singleton 2 : map B.putWord32be ns), fromIntegral $ 4 * length ns + 1)
serialiseTree (Node ts) = (mconcat (B.singleton 1 : map B.putWord32be offsets ++ branches), 
                           baseLength + sum subLengths)
   where
      (branches, subLengths) = unzip $ map serialiseTree ts
      baseLength = fromIntegral $ 1 + 4 * length ts
      offsets = init $ scanl (+) baseLength subLengths


main = do
   putStrLn $ "Length = " ++ show (snd $ serialiseTree makeTree)
   BL.writeFile "test.bin" $ B.toLazyByteString $ fst $ serialiseTree makeTree

友情链接