Just got the initial C port of jcBench completed. Right now there are IRIX 6.5 MIPS IV and Win32 x86 binaries working, I'm hoping to add additional functionality and then merge back in the changes I made to the original 4 platforms. I should note the performance numbers between the 2 will not be comparable. I rewrote the actual benchmarking algorithm to be solely integer based, that's not to say I won't add a floating point, but it made sense after porting the C# code to C. That being said, after finding out a while back on how Task Parallel Library (TPL) really works, my implementation of multi-threading using POSIX, does things a little differently.

Where the TPL starts off with one thread and as it continues processing increases the threads dynamically, my implementation simply takes the number of threads specified via the command line, divides the work (in my case the number of objects) by the number of threads and kicks off the threads from the start. While TPLs implementation is great for work that you don't know if it will really even hit the maximum number of cpus/cores efficiently, for my case it actually hinders performance. I'm now wondering if you can specify from the start how many threads to kick off? If not, Microsoft, maybe add support for that? I've got a couple scenarios I know for instance would benefit from at least 4-8 threads initially, especially for data migration that I prefer to do in C# versus SSIS (call me a control freak).

Back to jcBench, at least with the current algorithm, it appears that a MIPS 600mhz R14000A with 4MB of L2 cache is roughly equivalent to a 1200mhz Phenom II with 512kb L2 cache and 6mb of L3 cache at least in Integer performance. This is based on a couple runs of the new version of jcBench. It'll be interesting to see with numalink if it continues this to 1 to 2 ratio. I'm hoping to see how different generations of AMD cpus compare to the now 10 year old MIPS cpu.