English 中文(简体)
C# Stream Design Question
原标题:

I have an appliction right now that is a pipeline design. In one the first stage it reads some data and files into a Stream. There are some intermediate stages that do stuff to the stream of data. And then there is a final stage that writes the stream out to somewhere. This all happens serially, one stage completes and then hands off to the next stage.

This all has been working just great, but now the amount of data is starting to get quite a bit larger (hundreds of GB potentially). So I m thinking that I will need to do something to alleviate this. My initial thought is what I m looking for some feedback on (being an independent developer I just don t have anywhere to bounce the idea off of).

I m thinking of creating a Parallel pipeline. The Object that starts off the pipeline would create all of the stages and kick each one off in it s own thread. When the first stage gets the stream to some certain size then it will pass that stream off to the next stage for processing and start up a new stream of its own to continue to fill up. The idea here being that the final stage will be closing out streams as the first stage is building a new ones so my memory usage would be kept lower.

So questions: 1) Any high level thoughts on directions for this design? 2) Is there a simpler approach that you can think of that might apply here? 3) Is there anything existing out there that does something like this that I could reuse (not a product I have to buy)?

Thanks,

MikeD

最佳回答

The producer/consumer model is a good way to proceed. And Microsoft has their new Parallel Extensions which should provide most of the ground work for you. Look into the Task object. There s a preview release available for .NET 3.5 / VS2008.

Your first task should read blocks of data from your stream and then pass them onto other tasks. Then, have as many tasks in the middle as logically fit. Smaller tasks are (generally) better. The only thing you need to watch out for is to make sure the last task saves the data in the order it was read (because all the tasks in the middle may finish in a different order to what they started).

问题回答

For the design you ve suggested, you d want to have a good read up on producer/consumer problems if you haven t already. You ll need a good understanding of how to use semaphores in that situation.

Another approach you could try is to create multiple identical pipelines, each in a separate thread. This would probably be easier to code because it has a lot less inter-thread communication. However, depending on your data you may not be able to split it into chunks this way.

In each stage, do you read the entire chunk of data, do the manipulation, then send the entire chuck to the next stage?

If that is the case, you are using a "push" technique where you push the entire chunk of data to the next stage. Are you able to handle things in a more stream like manor using a "pull" technique? Each stage is a stream, and as you read data from that stream, it pulls data from the previous stream by calling read on it. As each stream is being read, it reads from the previous stream in small bits, processes it and returns the processed data. The destination stream determines how many bytes to read from the previous stream, and you don t ever have to consume large amounts of memory. This is how applications like BizTalk work. There are some blogs about how BizTalk Pipeline streams work, and I think it might be exactly what you want.

Here s a multi-part blog entry that you might find interesting:

Part 1
Part 2
Part 3
Part 4
Part 5





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签