English 中文(简体)
how to efficiently apply a medium-weight function in parallel
原标题:

I m looking to map a modestly-expensive function onto a large lazy seq in parallel. pmap is great but i m loosing to much to context switching. I think I need to increase the size of the chunk of work thats passed to each thread.

I wrote on a function to break the seq into chunks and pmap the function onto each chunk and recombine them. this works but the results have not been spectacular. The origional code essentially looks like this:

(pmap eval-polynomial (range x) coificients)

How can I really squeez this while keeping it lazy?

最佳回答

How about using the partition function to break up your range sequence? There was an interesting post on a similar problem at http://www.fatvat.co.uk/2009/05/jvisualvm-and-clojure.html

问题回答

I d look at the ppmap function from: http://www.braveclojure.com/zombie-metaphysics/. It lets you pmap while specifying the chunk size.

The solution to this problem is to increase the grain size, or the amount of work done by each parallelized task. In this case, the task is to apply the mapping function to one element of the collection. Grain size isn’t measured in any standard unit, but you’d say that the grain size of pmap is one by default. Increasing the grain size to two would mean that you’re applying the mapping function to two elements instead of one, so the thread that the task is on is doing more work. [...] Just for fun, we can generalize this technique into a function called ppmap, for partitioned pmap. It can receive more than one collection, just like map:

(defn ppmap
  "Partitioned pmap, for grouping map ops together to make parallel
  overhead worthwhile"
  [grain-size f & colls]
  (apply concat
   (apply pmap
          (fn [& pgroups] (doall (apply map f pgroups)))
          (map (partial partition-all grain-size) colls))))
(time (dorun (ppmap 1000 clojure.string/lower-case orc-name-abbrevs)))
; => "Elapsed time: 44.902 msecs"

If you don t mind something slightly exotic (in exchange for some really noticeable speedup), you might also want to look into the work done by the author of the Penumbra-library, which provides easy access to the GPU.

I would look at the Fork/Join library, set to be integrated into JDK 7. It s a lightweight threading model optimized for nonblocking, divide-and-conquer computations over a dataset, using a thread pool, a work-stealing scheduler and green threads.

Some work has been done to wrap the Fork/Join API in the par branch, but it hasn t been merged into main (yet).





相关问题
OutOfMemoryException on MemoryStream writing

I have a little sample application I was working on trying to get some of the new .Net 4.0 Parallel Extensions going (they are very nice). I m running into a (probably really stupid) problem with an ...

Master-Slave Pattern for Distributed Environment

Currently we have a batch driven process at work which runs every 15 mins and everytime it runs it repeats this cycle several times: Calls a sproc and get some data back from the DB Process the data ...

How to use database server for distributed job scheduling?

I have around 100 computers and few workers on each of them. The already connect to a central database to query for job parameters. Now I have to do job scheduling for them. One job for one worker ...

minimum work size of a goroutine [closed]

Does anyone know approximately what the minimum work size is needed in order for a goroutine to be beneficial (assuming that there are free cores for the work to be offloaded to)?

Optimal number of threads per core

Let s say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is ideally parallelizable, so I can run chunks of it on an infinite number of threads and each ...

What s the quickest way to parallelize code?

I have an image processing routine that I believe could be made very parallel very quickly. Each pixel needs to have roughly 2k operations done on it in a way that doesn t depend on the operations ...

how to efficiently apply a medium-weight function in parallel

I m looking to map a modestly-expensive function onto a large lazy seq in parallel. pmap is great but i m loosing to much to context switching. I think I need to increase the size of the chunk of work ...

热门标签