Found this blog post from 3/14/2012 by Stephen Toub on MSDN, which answers a lot of questions I had and it was nice to have validated an approach I was considering earlier:
Parallel.For doesn’t just queue MaxDegreeOfParallelism tasks and block waiting for them all to complete; that would be a viable implementation if we could assume that the parallel loop is the only thing doing work on the box, but we can’t assume that, in part because of the question that spawned this blog post. Instead, Parallel.For begins by creating just one task. When that task is executed, it’ll first queue a replica of itself, and will then enlist in the processing of the loop; at this point, it’s the only task processing the loop. The loop will be processed serially until the underlying scheduler decides to spare a thread to process the queued replica. At that point, the replica task will be executed: it’ll first queue a replica of itself, and will then enlist in the processing of the loop.
So based on that response, at least in the current implementation of the Task Parallel Library in .NET 4.x, the approach is to slowly created parallel threads as the resources allow for and fork off new threads as soon and as many possible.