English 中文(简体)
Manually implementing high performance algorithms in .NET
原标题:

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#.

Apart from needing to add an extra range check on the left/right variables before the recursive call, it appears to work quite well.

I knew beforehand that the framework provides a built-in Quicksort implementation in List<>.Sort (via Array.Sort). So I tried some basic profiling to compare performance. Results: The built-in List<>.Sort method, operating on the same lists, performs about 10 times faster than my own manual implementation.

Using reflector, I found that the actual sorting in List<>.Sort is implemented in external code, not IL (in a function named tryszsort()).

Looking at my own Quicksort implementation I would expect that replacing the recursive calls with iteration might give some improvement. Also, disabling array bounds checking (if possible) could also give some benefits. Maybe this would get some way closer to the built-in implementation but I m not confident.

So my question: Is it realistic to expect performance in an optimised algorithm (written in .NET IL, jitted to native code) can compete with performance of an externally implemented algorithm?

Once again, I realise Quicksort is provided as part of the framework, this was just a learning experience for me. However there are also many algorithms (CRC32 comes to mind) that are not provided, but still could be of much value to many applications. Here s a related question regarding implementing CRC32 in .NET and performance issues.

So if you need to implement such an algorithm in .NET, what are the major performance considerations to understand, so that your algorithm can at least approach the performance of external code?

[Update]

I have improved execution speed to within about 10% of the built in Array.Sort by changing the algorithm to operate on a simple array of Int, instead of List. In Reflector, I can see this avoids a Callvirt() operation on every get or set on the list. I thought this might improve things, but I m surprised by how much.

最佳回答

By using non-recursive code and, especially, by using "unsafe" blocks and pointer arithmetic (if applicable) you could actually see a x5 or x10 performance gain with an algorithm written in C#. As always with performance (and even more when dealing with a managed environment), you never quite know until you try it and benchmark it.

Now, generally speaking, you should mostly write stuff in C#, then optimize it, optimize it some more, and if it s still not good enough, identify the exact critical piece of code and port it to C++ (while being careful about limiting the number of managed/native call boundaries).

问题回答

Just out of curiosity, as despite my 9 years of experience with .NET, I still constantly make this mistake: Did you compile your code in Release mode with optimizations on? Debug code performs significantly worse than optimized release code.

Assuming you DID compile in release mode, there shouldn t be a huge difference in performance if you implement the algorithm similarly (i.e. iterative vs. iterative or recursive vs. recursive). If you wish to see the .NET implementation and figure out, you can download the SSCLI, Share-Source Common Language Infrastructure. This is Microsoft s publicly availble ECMA-standard compliant CLI implementation. It is not 100% of the .NET framework we all know and love, but it is a significant portion of it. It can provide a lot of information that Reflector can t, including internal implementations. All types of code is available, including C#, C++, and even some Assembler in a couple cases.

Make sure you re comparing apples and apples.

When sorting, it s possible for the Compare function to be dominant, and that might differ between the implementations.

Assuming the Compare function in both cases is fast enough not to be an issue, then the time can be dominated by stuff like array bounds checking, which can easily make a big difference.





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签