latest posts


For the longest time I’ve had these great ideas only to keep them in my head and then watch someone else or some company turn around and develop the idea (not to say someone stole the idea, but given the fact that there are billions of people on this planet, it is only natural to assume one of those billion would come up with the same idea). Watching this happen, as I am sure other developers have had since the 70s I’ve decided to put my outlook on things here, once a year, every July.

As one who reads or has read my blog for a decent amount of time knows I am very much a polyglot of software and enjoy the system building/configuration/maintaining aspect of hardware. For me, they go hand in hand. The more I know about the platform itself (single threaded performance versus multi-threaded performance, disk iops etc.) the better I can program the software I develop. Likewise the more I know about a specific programming model, the better I will know the hardware that it is specialized for. To take it a step further, this makes decisions on implementation at work & my own projects better.

As mentioned in the About Me section, I started out in QBasic and a little later when I was 12 I really started getting into custom pc building (which wasn’t anywhere as big as it is today). Digging through the massive Computer Shopper magazines, drooling over the prospect of the highest end Pentium MMX CPUs, massive (at the time) 8 GB hard drives and 19” monitors. Along with the less glamorous 90s PC issues of IRQ conflicts, pass through 3Dfx Voodoo cards that required a 2D video card (and yet another PCI slot), SCSI PCI controllers and dedicated DVD decoders. Suffice it to say I was glad I experienced all of that because it as it creates a huge appreciation for USB, PCI Express, SATA and if nothing else the stability of running a machine 24/7 on a heavy work load (yes part of that is also software).

To return to the blog’s title…

Thoughts on the Internet of Things?

Universally I do follow the Internet of Things (IoT) mindset. Everything will be interconnected, which brings the question of privacy and what that means for the developer of the hardware, the software and consumer. As we all know, your data is money. If the lights in your house for instance were WiFi enabled and connected into a centralized server in your house with an exposed client on a tablet or phone I would be willing to be the hardware and software developer would love to know the total energy usage, which lights in which room were on, what type of bulbs and when the bulbs were dying. Marketing data could then be sold to let you know of bundle deals, new “more efficient” bulbs, how much time is spent in which rooms (if you are in the home theater room a lot, sell the consumer on blu-rays and snacks for instance). With each component of your home becoming this way, the more data will be captured and in some cases will be able to predict what you want before you realize, simply based off your tendencies.

While I don’t like the lack of privacy in that model (hopefully some laws can be enacted to resolve those issues), being a software developer I would hate to be ever associated with the backlash of capturing that data, but this idea of everything being connected will create a whole new programming model. With the recent trend towards REST web services returning Gzipped JSON with WebAPI for instance, the problem of submitting and retrieving has never been easier and portable across so many platforms. With C# in particular in conjunction with the Http Client library available on NuGet, a lot of the grunt work is already done for you in an asynchronous manner. Where I do see a change, is in the standardization of an API for your lights, TV, garage door, toaster etc. Allowing 3rd party plugins and universal clients to be created rather than having a different app to control element or one company providing a proprietary API that only works on their devices, forcing the difficult decision for the consumer to either stay with that provider to be consistent or mixing the two, requiring 2 apps/devices.

Where do I see mobile technology going?

Much like where the Mobile Devices have headed towards (as I predicted 2 years ago), apps are becoming ever increasingly integrated into your device (for better or for worse). I don’t see this trend changing, but I do hope from a privacy standpoint the apps have to become more explicit in what they are accessing. I know there is fine line from the big three (Apple, Google and Microsoft) in becoming overly explicit before any action (remember Vista?), but think if an app gets more than your current location, the capabilities should be brought to a bolder or larger font to better convey the apps true accessibility to your device. I don’t see this situation getting better from a privacy standpoint, but I do see more and more customer demand for the “native” experience to be like that of Cortana on Windows Phone 8.1. She has access to the data you provide her and will help make your experience better. As the phones provide more and more APIs, this trend will only continue until apps are more of plugins to your base operating system’s experience to integrate into services like Yelp, Facebook, Twitter etc.

Where do I see web technology going?

I enjoyed diving into MVC over the last year in a half. The model definitely feels much more in line with a MVVM XAML project, but still has overwhelming strong tie to the client side between the strong use of jQuery and the level of effort in maintaining the ever changing browser space (i.e. browser updates coming out at alarming rate). While I think we all appreciate when we goto a site on our phones or desktop and it scales nicely providing a rich experience no matter the device, I feel the ultimate goal of trying to achieve a native experience in the browser is waste of effort. I know just about every web developer might stop reading and be in outrage – but what is the goal of the last web site you developed and designed that was also designed for the mobile? Was it to convey information to the masses? Or was it simply a stop gap until you had a native team to develop for the big three mobile platforms?

In certain circumstances I agree with the stance of making HTML 5 web apps instead of native apps, especially when it comes to cost prohibiting of a project. But at a certain point, especially as of late with Xamarin’s first class citizen status with Microsoft you have to ask yourself, could I deliver a richer experience natively and possible faster (especially given the cast range of mobile browsers to content with in the HTML 5 route)?

If you’re a C# developer who wants to see a native experience, definitely give the combination of MVVM Cross, Xamarin’s Framework and utilizing Portable Libraries a try. I wish all of those tools existed when I first dove into iOS development 4 years ago.

Where do I see desktop apps going?

In regards to desktop applications, I don’t see them going away even in the “app store” world we live in now. I do however see a demand for a richer experience expected by customers after having a rich native experience on their phone or after using a XAML Windows 8.x Store App. The point being, I don’t think it will be acceptable for an app to look and feel like the default WinForms grey and black color scheme that we’ve all used at one point in our careers and more than likely began our programming (thinking back to classic Visual Basic).

Touch will also play a big factor in desktop applications (even in the enterprise). Recently at work I did a Windows 8.1 Store App for an executive dashboard. I designed the app with touch in mind, and it was interesting how it changes your perspective of interacting with data. The app in question, utilized Mutli-layered graphs and a Bing Map with several layers (heat maps and pushpins). Gone was the un-natural mouse scrolling and instead pinching, zooming and rotating as if one was in a science fiction movie from just 10 years ago.

I see this trend continuing especially as the number of practical general purpose devices like laptops having touch screens at every price point, instead of the premium they previously demanded. All that needs to come about is a killer application for the Windows Store – could your next app be that app?

Where is programming heading in general?

Getting programmers out of the single-threaded – top to bottom programming mindset. I am hoping next July when I do a prediction post this won’t even be a discussion point, but sadly I don’t see this changing anytime soon. Taking a step back and looking at what this means generally speaking: programmers aren’t utilizing the hardware available to them to their full potential.

Over 5 years ago at this point I found myself at ends with a consultant who kept asking for more and more cpus added to a particular VM. At the time when he first asked, it seemed reasonable as there was considerably more traffic coming to a particular ASP.NET 3.5 Web Application as a result of a lot of eagerly awaited functionality he and his team had just deployed. Even after the additional CPUS were added, his solution was still extremely slow under no load. This triggered me to review his Subversion checkins and I realized the crux of the matter wasn’t the server – it was his single threaded resource intensive/time consuming code. In this case, the code was poorly written on top of trying to achieve a lot of work performed on a particular page. For those that remember back to .NET 3.5’s implementation of LINQ, it wasn’t exactly a strong performer in performance intensive applications, let alone being looped through multiple times as opposed to one larger LINQ Query. The moral of the story being the single-threaded coded only helped for handling the increased load, not the performance of a user’s experience on a 0% load session.

A few months later when .NET 4 came out of beta and further still when the Task Parallel Library was released it changed my view on performance (After all, jcBENCH stemmed from my passion for diving into Parallel Programming on different architectures and operating systems back in January 2012). No longer was I relying on high single threaded performing cpus, bu t instead writing my code to take advantage of the ever-increasing # of cores available to me at this particular client (for those curious 2U 24 core Opteron HP G5 rackmount servers).

With .NET’s 4.5’s async/await I was hopeful that meant more developers I worked with would take advantage of this easy model and no longer lock the UI thread, but I was largely disappointed. If developers couldn’t grasp async/await, let alone TPL how could they proceed to what I feel is an even bigger breakthrough to become available to developers: Heterogeneous Programming, or more specifically OpenCL.

With parallel programming comes the need to break down your problem into independent problems, all coming together at a later time (like breaking down image processing to look at a range of pixels rather than the entire image for instance). This is where Heterogeneous Programming can make an even bigger impact, in particular with GPUs (Graphics Processing Units) which have upwards of hundreds of cores to process tasks.

I had dabbled in OpenCL as far as back as June 2012 in working on the OpenCL version of jcBENCH and I did some further research back in January/February of this year (2014) in preparation for a large project at work – a project I ended up using the TPL extensively instead. The problem wasn’t OpenCL’s performance, but my mindset at the time. Before the project began, I thought I knew the problem inside out, but really I only knew it as a human would think about it – not a machine that only knows 1s and 0s. The problem wasn’t a simple task, nor was it something I had ever even attempted previously so I gave myself some slack two months in when it finally hit me on what I was really trying to solve – teaching a computer to think like a human. Therefore when pursuing Heterogeneous programming as a possible solution, ensure you have a 100% understanding of the problem and what you are in the end trying to achieve, in most cases it might make sense to utilize OpenCL instead of a traditional parallel model like with the TPL.

So why OpenCL outside of the speed boost? Think about the last laptop or desktop you bought, chances are you have an OpenCL 1.x compatible APU and/or GPU in it (i.e. you aren’t required to spend any more money – just utilizing what has already been available to you). In particular on the portable side, laptops/Ultrabooks that already have a lower performing CPU than your desktop, why utilize the CPU when the GPU could off load some of that work?

The only big problem with OpenCL for C# programmers is the lack of an officially supported interop library from AMD, Apple or any of the other members of the OpenCL group. Instead you’re at the mercy of using one of the freely available wrapper libraries like OpenCL.NET or simply writing your own wrapper. I haven’t made up my mind yet as to which path I will go down – but I know at some point a middle ware makes sense. Wouldn’t it be neat to have a generic work item and be able to simply pass it off to your GPU(s) when you wanted?

As far as where to begin with OpenCL in general, I strongly suggest reading the OpenCL Programming Guide. Those who have done OpenGL and are familiar with the “Red Book”, this book follows a similar pattern with a similar expectation and end result.


Could I be way off? Sure – it’s hard to predict the future, while being grounded in the past that brought us here, meaning it’s hard to let go of how we as programmers and technologists in the world have evolved in the last 5 years to satisfy not only our current consumer demand but our own and anticipate what is next. What I am more curious in hearing is programmers outside of the CLR in particular the C++, Java and Python crowds – where they feel the industry is heading and how they see their programming language handling the future, so please leave comments.
After a few days of development, jcBENCH2 is moving along nicely. Features completed:

1. WebAPI and SQL Server Backend for CRUD Operations of Results
2. Base UI for the Windows Store App is completed
3. New Time Based CPU Benchmark inside a PCL
4. Bing Maps Integration for viewing the top 20 results

Screenshot of the app as of tonight:
jcBENCH2 Day 4

What's left?

7/17/2013 - Social Networking to share results
7/18/2013 - Integrate into the other #dev777 projects
7/19/2013 - Bug fixes, polish and publish

More details of the development process after the development is complete - I would rather focus on the actual development of the app currently.
Starting a new, old project this weekend as part of the #dev777 project, jcBENCH 2. The idea being, 7 developers, develop 7 apps and have them all communicate with each other on various platforms.

Those that have been following my blog for a while, might know I have a little program I originally wrote January 2012 as part of my trip down the Task Parallel Library in C# called, jcBENCH. Originally I created Mac OS X, IRIX (in C++), Win32 and Windows Phone 7 ports. This year I created a Windows Phone 8 app and a revamped WPF port utilizing a completely new backend.

So why revisit the project? The biggest reason: never being 100% satisfied because of my skill set constantly being expanded I find myself always wanting to go back and make use of a new technology even if the end user sees no benefit. It's the principle - never let your code rot.

So what is Version 2 going to entail? Or better put, what are some of the issues in the jcBENCH 1.x codebase?

Issues in the 1.x Codebase

Issue 1

As it stands today all of the ports have different code bases. In IRIX's case this was a necessity since Mono hasn't been ported to IRIX (yet). With the advent of PCL (Portable Class Libraries) I can now keep one code base for all but the IRIX port, leaving only the UI and other platform specific APIs in the respective ports.

Issue 2

On quad core machines or faster the existing benchmark completes in a fraction of the time. This poses two big problems - doesn't represent a real test of performance over a few second span (meaning all of the CPUs may not have enough time to be tasked before completion) and on the flip side those devices that are much slower (like a cell-phone) it could take several minutes. Solution? Implement a 16 second time benchmark and then calculate the performance based on how many objects were processed during tha time.

Issue 3

When testing multi-processor performance, it was cumbersome to test all of the various scenarios. For instance if you had an 8 core CPU as I do with my AMD FX-8350, I had to select 1 CPU, run the benchmark and then record the result, select 2 CPUs and repeat so on and so forth. This took a long time when in reality it would make sense to offer the ability to either run the benchmark on using all cores by default and then via an advanced option allow the end user to select a specific test or have it do the entire test automatically.

Issue 4

No easy way to share the results exists across the board in the current version. In recent versions I added a centralized result database and charting so no matter the device you could see how your device compared, but there was no easy to get a screenshot of the benchmark, send the results via email or post on a social network. Where is the fun in a benchmark if you can't brag about it easily? In Version 2 I plan to focus on this aspect.

Proposed Features for Version 2

1. Rewritten from the ground up utilizing the latest approaches to cross-platform development I have learned since jcBENCH's original release 1/2012. This includes the extensive use of MVVMCross and Portable Class Libraries to cut down on the code duplication among ports.

2. Sharing functionality via Email and Social Networking (Twitter and Facebook) will be provided, in addition a new Bing Map will visually reflect the top performing devices across the globe (if the result is submitted with location access allowed)

3. Using WebAPI (JSON) instead of WCF XML backend for result submission and retrieval. For this app since there is no backend processing between servers, WebAPI makes a lot more sense.

4. New Timed Based Benchmark as opposed to time to process X amount of tasks

5. Offer an "advanced" mode to allow the entire test suite to be performed or individual tests (by default it will now use all of the cores available)

6. At launch only a Windows Store app will be available, but Windows Phone 7/8 and Mac OS X ports will be released later this month.

Future Features

Ability to benchmark GPUs is something I have been attempting to get working across platforms and for those that remember I had a special Alpha release last Fall using OpenCL. Once the bugs and features for Version 2 are completed I will shift focus to making this feature a reality.

Implement all of this functionality in a upgraded IRIX port and finally create a Linux port (using Mono). One of the biggest hurdles I was having with keeping the IRIX version up to date was the SOAP C++ Libraries not being anywhere near the ease of user a Visual Studio/C# environment offers. By switching over to HTTP/JSON I'm hoping to be able to parse and submit data much easier.

Next Steps

Given that the project is an app in 7 days, today marks the first day of development. As with any project, the first step was getting a basic feature set as mentioned above and now to create a project timeline based on that functional specification.

As with my WordPress to MVC Project in April, this will entail daily blog posts with my progress.

Day 1 (7/13/2013) - Create the new SQL Server Database Schema and WebAPI Backend
Day 2 (7/14/2013) - Create all of the base UI Elements of the Windows Store App
Day 3 (7/15/2013) - Create the PCL that contains the new Benchmark Algorithms
Day 4 (7/16/2013) - Integrate Bing Maps for the location based view
Day 5 (7/17/2013) - Add Social Networking and Email Sharing Options
Day 6 (7/18/2013) - Integrate with fellow #dev777 projects
Day 7 (7/19/2013) - Bug fixing, polish and Windows Store Submission

So stay tuned for an update later today with my the successes and implementation of the new SQL Server Database Schema and WebAPI Backend
Update on 8/18/2012 8:41 AM EST - Expanded my result commentary, added data labels in the graph and made the graph image larger An interesting question came up on Twitter this morning about how the overhead in calling Parallel.ForEach vs the more traditional foreach would impact performance. I had done some testing in .NET 4 earlier this year and found the smaller collections of objects < 10 not worth the performance hit with Parallel.ForEach. So I wrote a new test in .NET 4.5 to test this question on a fairly standard task: taking the results of a EF Stored Procedure and populating a collection of struct objects in addition to populating a List collection for a 1 to many relationship (to make it more real world). First thing I did was write some code to populate my 2 SQL Server 2012 Tables:
using (TempEntities eFactory = new TempEntities()) {
     Random rnd = new Random(1985); for (int x = 0; x < int.MaxValue; x++) {
     Car car = eFactory.Cars.CreateObject(); car.MakerName = x.ToString() + x.ToString(); car.MilesDriven = x * 10; car.ModelName = x.ToString(); car.ModelYear = (short)rnd.Next(1950, 2012); car.NumDoors = (short)(x % 2 == 0 ? 2 : 4); eFactory.AddToCars(car); eFactory.SaveChanges(); for (int y = 0; y < rnd.Next(1, 5); y++) {
     Owner owner = eFactory.Owners.CreateObject(); owner.Age = y; owner.Name = y.ToString(); owner.StartOfOwnership = DateTime.Now.AddMonths(-y); owner.EndOfOwnership = DateTime.Now; owner.CarID = car.ID; eFactory.AddToOwners(owner); eFactory.SaveChanges(); }
I ran it long enough to produce 121,501 Car rows and 190,173 Owner rows. My structs:
public struct OWNERS {
     public string Name; public int Age; public DateTime StartOfOwnership; public DateTime? EndOfOwnership; }
public struct CAR {
     public string MakerName; public string ModelName; public short ModelYear; public short NumDoors; public decimal MilesDriven; public List<OWNERS> owners; }
Then for my Parallel.Foreach code:
static List<OWNERS> ownersTPL(int CarID) {
     using (TempEntities eFactory = new TempEntities()) {
     ConcurrentQueue<OWNERS> owners = new ConcurrentQueue<OWNERS>(); Parallel.ForEach(eFactory.getOwnersFromCarSP(CarID), row => {
     owners.Enqueue(new OWNERS() {
     Age = row.Age, EndOfOwnership = row.EndOfOwnership, Name = row.Name, StartOfOwnership = row.StartOfOwnership }
); }
); return owners.ToList(); }
static void runTPL(int numObjects) {
     using (TempEntities eFactory = new TempEntities()) {
     ConcurrentQueue<CAR> cars = new ConcurrentQueue<CAR>(); Parallel.ForEach(eFactory.getCarsSP(numObjects), row => {
     cars.Enqueue(new CAR() {
     MakerName = row.MakerName, MilesDriven = row.MilesDriven, ModelName = row.ModelName, ModelYear = row.ModelYear, NumDoors = row.NumDoors, owners = ownersTPL(CarID: row.ID) }
); }
); }
My foreach code:
static void runREG(int numObjects) {
     using (TempEntities eFactory = new TempEntities()) {
     List<CAR> cars = new List<CAR>(); foreach (getCarsSP_Result row in eFactory.getCarsSP(numObjects)) {
     List<OWNERS> owners = new List<OWNERS>(); foreach (getOwnersFromCarSP_Result oRow in eFactory.getOwnersFromCarSP(row.ID)) {
     owners.Add(new OWNERS() {
     Age = oRow.Age, EndOfOwnership = oRow.EndOfOwnership, Name = oRow.Name, StartOfOwnership = oRow.StartOfOwnership }
); }
cars.Add(new CAR() {
     MakerName = row.MakerName, MilesDriven = row.MilesDriven, ModelName = row.ModelName, ModelYear = row.ModelYear, NumDoors = row.NumDoors, owners = owners }
); }
Onto the results, I ran this on an AMD Phenom II X6 1090T (6x3.2ghz) CPU, 16gb of DDR3-1600 running Windows 8 RTM to give this a semi-real world feel having a decent amount of ram and 6 cores, although a better test would be on an Opteron or Xeon (testing slower, but more cores vs less, but faster cores of my desktop CPU). [caption id="attachment_1444" align="aligncenter" width="300"] Parallel.ForEach vs foreach (Y-Axis is seconds taken to process, X-Axis is the number of objects processed)[/caption] Surprisingly, for a fairly real world scenario .NET 4.5's Parallel.ForEach actually beat out the more traditional foreach loop in every test. Even more interesting is that until around 100 objects Parallel.ForEach wasn't visibly faster (the difference only being .05 seconds for 10 objects, but on a large scale/highly active data service where you're paying for CPU/time that could add up). Which does bring up an interesting point, I haven't looked into Cloud CPU Usage/hr costs, I wonder where the line between between performance of using n number of CPUs/cores in your Cloud environment and cost comes into play. Is 0.5 seconds for an average CPU usage OK in your mind to your customers? Or will you deliver the best possible experience to your customers and either ignore the costs incurred or offload the costs to them? This would be a good investigation I think and relatively simple with the ParallelOptions.MaxDegreeOfParallelism property. I didn't show it, but I also ran the same code using a regular for loop as I had read an article several years ago (probably 5 at this point) that showed foreach loop being much slower than a for loop. Surprisingly, the results were virtually identical to the foreach loop. Take that for what it is worth. Feel free to do your own testing for your own scenarios to see if a Parallel.ForEach loop is ever slower than a foreach loop, but I am pretty comfortable saying that it seems like the Parallel.ForEach loop has been optimized to the point where it should be safe to use it in place of a foreach loop for most scenarios.
I had been wondering what the effect of syntax would have on performance. Thinking the interpreter might handle things differently depending on the usage, I wanted to test my theory.

Using .NET 4.5 with a Win32 Console Application project type, I wrote a little application doing a couple trigonometric manipulations on 1 Billion Double variables.

For those that are not aware using the Task Parallel Library you have 3 syntaxes to loop through objects:

Option #1 - Code within the loop's body
Parallel.ForEach(generateList(numberObjects), item => {
     double tmp = (Math.Tan(item) * Math.Cos(item) * Math.Sin(item)) * Math.Exp(item); tmp *= Math.Log(item); }
); ]]>
Option #2 - Calling a function within a loop's body
Parallel.ForEach(generateList(numberObjects), item => {
     compute(item); }
); ]]>
Option #3 - Calling a function inline
Parallel.ForEach(generateList(numberObjects), item => compute(item)); ]]>
That being said, here are the benchmarks for the 3 syntaxes run 3 times:
Option #1 4.0716071 seconds 3.9156058 seconds 4.009207 seconds
Option #2 4.0376657 seconds 4.0716071 seconds 3.9936069 seconds
Option #3 4.040407 seconds 4.3836076 seconds 4.3056075 seconds
Unfortunately nothing conclusive, so I figured make the operation more complex.

That being said, here are the benchmarks for the 3 syntaxes run 2 times:
Option #1 5.4444095 seconds 5.7313278 seconds
Option #2 5.5848097 seconds 5.5633182 seconds
Option #3 5.8793363 seconds 5.6793248 seconds
Still nothing obvious, maybe there really isn't a difference?
Found this blog post from 3/14/2012 by Stephen Toub on MSDN, which answers a lot of questions I had and it was nice to have validated an approach I was considering earlier:
Parallel.For doesn’t just queue MaxDegreeOfParallelism tasks and block waiting for them all to complete; that would be a viable implementation if we could assume that the parallel loop is the only thing doing work on the box, but we can’t assume that, in part because of the question that spawned this blog post. Instead, Parallel.For begins by creating just one task. When that task is executed, it’ll first queue a replica of itself, and will then enlist in the processing of the loop; at this point, it’s the only task processing the loop. The loop will be processed serially until the underlying scheduler decides to spare a thread to process the queued replica. At that point, the replica task will be executed: it’ll first queue a replica of itself, and will then enlist in the processing of the loop.
So based on that response, at least in the current implementation of the Task Parallel Library in .NET 4.x, the approach is to slowly created parallel threads as the resources allow for and fork off new threads as soon and as many possible.
Diving into Multi-Threading the last couple nights, but not in C# like I had previously. Instead with C. Long ago, I had played with SDL's Built-In Threading when I was working on the Infinity Project. Back then, I had just gotten a Dual Athlon-XP Mobile (Barton) motherboard, so it was my first chance to play with multi-cpu programming. Fast forward 7 years, my primary desktop has 6 cores and most cell phones have at least 2 CPUs. Everything I've written this year has been with multi-threading in mind whether it is an ASP.NET Web Application, Windows Communication Foundation Web Service or Windows Forms Application. Continuing my quest into "going back to the basics" from last weekend, I chose my next quest would be to dive back into C, and attempt to port jcBench to Silicon Graphics' IRIX 64bit MIPS IV platform (it was on the original list of platforms). The first major hurdle, was programming C like C#. Not having classes, the keyword "new", syntax for certain things being completely different (structs for instance), having to initialize arrays with malloc only to remember after getting segmentation faults that by doing so will overload the heap (the list goes on). I've gotten "lazy" with my almost exclusive use of C# it seems, declaring an "array" like:
ConcurrentQueue<SomeObject> cqObjects = new ConcurrentQueue<SomeObject>(); ]]>
After the "reintroduction" to C, I started to map out what would be necessary to make an equivalent approach to the Task Parallel Library, not necessarily the syntax, but how it handled nearly all of the work for you. Doing something like (note you don't need to assign the return value from the Entity Model, it could be simply put in the first argument of Parallel.ForEach, I just kept it there for the example):
List<SomeEntityObject> lObjects = someEntity.getObjectsSP().ToList(); // To ensure there would be no lazy-loading, use the ToList method ConcurrentQueue<SomeGenericObject> cqGenericObjects = new ConcurrentQueue<SomeGenericObject>(); Parallel.ForEach(lObjects, result => {
     if (result.SomeProperty > 1) {
     cqGenericObjects.Enqueue(new SomeGenericObject(result)); }
); ]]>
A few things off the bat you'd have to "port":
  1. Concurrent Object Collections to support modification of collections in a thread safe manner
  2. Iteratively knowing and handling how cores/cpus are available, and constantly allocating new threads as threads complete (ie 6 cores, 1200 tasks, kick off at least 6 threads and handle when those threads complete and "always" maintain a 6 thread count
The later I can imagine is going to be decent sized task in itself as it will involve platform specific system calls to determine the CPU count, breaking the task down dynamically and then managing all of the threads. At first thought the easiest solution might simply be:
  1. Get number of CPUs/Cores, n
  2. Divide number of "tasks" by the number cores and allocate those tasks for each core, thus only kicking off n threads
  3. When all tasks complete resume normal application flow
The problem with that is (or at least one of them), is if the actual data for certain objects is considerably more complex then others, you could have 1 or more CPUs finished before the others, which would be wasteful. You could I guess infer based on a sampling of data, maybe kick off 1 thread to "analyze" the data from various indexes in the passed in array and calculate the average time taken to complete, then anticipate the variation of task completion time to more evenly space out tasks. Also taking into account current cpu utilization, as many operating systems use 1 CPU affinity for Operating System tasks, so giving CPU 1 (or the CPU with Operating System usage) to begin with less tasks might make more sense to truly optimize the threading "manager". Hopefully I can dig up some additional information on how TPL allocates their threads to possible give a 3rd alternative, since I've noticed it handles larger tasks very well across multiple threads. Definitely will post back with my findings....