latest posts

A pretty common task I run across is counting the number of occurrences for a specific string, Primary Key ID etc. A few examples: checking a valid username/password combination or an existing value in the database to prevent duplicate/redundant data. Typically if there were no joins involved I would typically just do something like the following:
public bool DoesExist(string someValue) {
     using (SomeEntity eFactory = new SomeEntity()) {
     return eFactory.SomeTable.Where(a => a.Value == someValue).Count() > 0; }
}
]]>
Or use the Parallel PLINQ version if there were a considerable amount of rows assuming the overhead involved in PLINQ would negate any performance advantage for smaller tables:
public bool DoesExist(string someValue) {
     using (SomeEntity eFactory = new SomeEntity()) {
     return eFactory.SomeTable.AsParallel().Where(a => a.Value == someValue).Count() > 0; }
}
]]>
However if there were multiple tables involved I would create a Stored Procedure and return the Count in a Complex Type like so:
public bool DoesExist(string someValue) {
     using (SomeEntity eFactory = new SomeEntity()) {
     return eFactory.SomeTableSP(someValue).FirstOrDefault().Value > 0; }
}
]]>
Intrigued on what the real performance impact was across the board and to figure out what made sense depending on the situation I created a common scenario, a Users Table like so: [caption id="attachment_1377" align="aligncenter" width="267"] Users SQL Server Table Schema[/caption] Populated this table with random data from 100 to 4000 rows and ran the above coding scenarios against it averaging 3 separate times to rule out any fluke scores. In addition I tested looking for the same value run 3X and a random number 3X to see if the row's value position would affect performance (if it was at the near the end of the table or closer to the beginning). I should note this was tested on my HP DV7 laptop that has an A10-4600M (4x2.3ghz CPU) running Windows 8 x64 with 16GB of ram and a Sandisk Extreme 240GB SSD. [caption id="attachment_1378" align="aligncenter" width="300"] LINQ vs PLINQ vs Stored Procedure Count Performance Graph[/caption] The most interesting aspect for me was the consistent performance of the Stored Procedure across the board no matter how many rows there were. I imagine the results are the same for 10,000, 20,000 etc. I'll have to do those tests later. In addition I imagine as soon as table joins come into the picture the difference between a Stored Procedure and a LINQ query would be even greater. So bottom line, use a Stored Procedure for counts. The extra time to create a Stored Procedure, import it into Visual Studio (especially in Visual Studio 2012 where it automatically creates the Complex Type for you) is well worth it.
After some more thought about jcBENCH and what its real purpose was I am going to drop the Solaris and IRIX ports. Solaris has a Mono port, but I only have Sun Blade 100 which has a single cpu. Not expecting a ton of performance from that. IRIX on the other hand, I have a Quad R14k 500 Origin 300, but no port of Mono exists. So I could port it to Java, but then you really couldn't compare benchmarks between the Mono/.NET versions. I am about 50% done with the Android port and am just waiting for the OpenSuse 12.1 compatible MonoDevelop release so I can get started on the Linux Port. After those 2 ports are completed I am thinking of starting something entirely new that I have been thinking about the last couple years. Those that deal with a SQL database and write a data layer for his or her .NET project, know the shortcomings or doing either:
  1. Using an ADO.NET Entity Model, adding your Tables, Views and Stored Procedures and then use that as is or extend it with some business logic
  2. Use an all custom data layer using the base DataTable, DataRows etc, wrap your objects with partial classes and create a "factory"
Both approaches have their pros and cons, the first takes a lot of less time, but you also have a lot less control and could be costly with all of the overhead. Both however will eventually fall apart down the road. The reason, they were built for one audience and one production server or servers. How many times have you gone to your IT Manager and asked for a new Database server because it was quicker then really go back to the architecture of your data layer. As time goes on, this could happen over and over again. I have personally witnessed such an event. A system was designed and built for around 50 internal users, on a single cpu web server and a dual Xeon database server. Over 5 years later, the code has remained the same yet it's been moved to 6 different servers with ever increasing speed. Times have changed and will continue to change, workloads vary from day to day, servers are swapped in and out, so my solution, an adaptive, dynamic data layer. One that profiles itself and uses that data to analyze the server to use either single threaded LINQ queries or PLINQ queries if the added overhead of using PLINQ would out way the time it would take only using one cpu. In addition using Microsoft's AppFabric to cache the commonly used intensive queries that maybe only get run once an hour and the data doesn't change for 24. This doesn't come without a price of course, having only architected this model in my head, I can't say for certain how much overhead the profiling will be. Over the next couple months, I'll be developing this so stay tuned. jcBENCH as you might have guessed was kind of a early test scenario of testing various platforms and how they handled multi-threaded tasks of varying intensity.