ruiu added a comment. This doesn't seem correct. Imagine that you have infinite number of cores. Then, the new code would take 2x time than the old code. I guess you are "fixing" the problem at a wrong place. https://reviews.llvm.org/D36607