Question

我读一下,如果使用大体序列的 lo/cur,那么 la的顺序会如何造成外围效应。我试图从记忆中填入3MB档案,以便处理。我认为,这正发生在我身上。但是,我不知道有办法解决这一问题。我尝试 do,但我的方案似乎没有结束。小型投入工作:

Small input (contents of file): AAABBBCCC Correct output: ((65 65) (65 66) (66 66) (67 67) (67 67))

法典:

(def file-path "/Users/me/Desktop/temp/bob.txt")
;(def file-path "/Users/me/Downloads/3MB_song.m4a")

(def group-by-twos
  (fn [a-list]
    (let [first-two (fn [a-list] (list (take 2 a-list)))
          the-rest-after-two (fn [a-list] (rest (rest a-list)))
          only-two-left? (fn [a-list] (if (= (count a-list) 2) true false))]
      (loop [result  () rest-of-list a-list]
        (if (nil? rest-of-list)
          result
          (if (only-two-left? rest-of-list)
            (concat result (list rest-of-list))
            (recur (concat result (first-two rest-of-list))
                   (the-rest-after-two rest-of-list))))))))

(def get-the-file
  (fn [file-name-and-path]
   (let [the-file-pointer
           (new java.io.RandomAccessFile (new java.io.File file-name-and-path) "r")
         intermediate-array (byte-array (.length the-file-pointer))] ;reserve space for final length
      (.readFully the-file-pointer intermediate-array)
      (group-by-twos (seq intermediate-array)))))

(get-the-file file-path)

正如我前面说过的那样,当我把all放在一席之地时,它似乎还没有结束。我怎样才能为大量档案开张,是否有办法消除我需要做的事情的认知负担? 一些规则?

Answer 1

你正在完全读到这个档案,然后在这个星座上打下一个eq子,这确实给你带来 sequence序列的任何好处,因为所要求的所有数据已经装上记忆和 sequence序列,真的意味着在需要时能够生产/生成数据。

你们可以做的是,使用诸如:

(def get-the-file
  (fn [file-name-and-path]
   (let [the-file-pointer
           (new java.io.RandomAccessFile (new java.io.File file-name-and-path) "r")
         file-len (.length the-file-pointer)] ;get file len
      (partition 2 (map (fn [_] (.readByte the-file-pointer)) (range file-len))))))

NOTE:我没有经过真正的审判,但我希望它至少给你有关阅读部分的zy档案的想法。

Answer 2

我猜想一种辅助解决办法是:

 (partition 2 (map int (slurp "/Users/me/Desktop/temp/bob.txt")))

这不是完全的讽刺,因为全部档案都装上了记忆,但对于并非太大的档案来说,它不应有问题。然而,如果你用缓冲器替换硫酸.,你将获得完全的zy。

注:如果档案的大小不一,这将吞.最后的果园。如果规模不好,尚不清楚。如果您希望在其清单中具有最后价值,请您使用<代码>(第2条[]......>。

user=> (partition 2 (map int "ABCDE"))
((65 66) (67 68))
user=> (partition 2 2 [] (map int "ABCDE"))
((65 66) (67 68) (69))

Answer 3

Beware of clojure data structures when dealing with large amounts of data. (typical Clojure app uses two to three times as much memory than the same Java application - sequences are memory expensive). If you can read the whole data into an array, do that. Then process it while making sure you don t keep reference to any sequence head to ensure garbage collection happens during the process.

Also strings are much bigger than char primitives. Single char string is 26 bytes and char is 2 bytes. Even if you don t like using arrays, arraylist is several times smaller than a sequence or a vector.

友情链接