<div dir="ltr"><div>On Wed, Nov 16, 2016 at 8:51 AM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 16 November 2016 at 11:36, Rui Ueyama <<a href="mailto:ruiu@google.com">ruiu@google.com</a>> wrote:<br>

> I think it is not beautiful to take care of granularity of tasks on caller<br>

> side, and it can be resolved with this. It works for me. What do you think?<br>

<br>

</span>I really don't think we can handle this is the callee. The question of<br>

what is a reasonable task is very dependent on the caller and an api<br>

like parallel_for_each should not be trying to do aggregation.<br></blockquote><div><br></div><div>I think I don't understand the point. This is a parallel_for_each. We know the exact number of tasks from the beginning, so we can compute how many tasks each core will take beforehand.</div><div><br></div><div>Let's say we have 10,000 tasks and 10 cores. Then we know we should assign 1,000 tasks for each core at the beginning, so we can spawn 10 threads and give 1,000 tasks for each thread. Why doesn't this work?</div><div><br></div><div>(In reality, each job is probably not the same weight, so we want to over-split tasks by, say, 10x, to maximize core utilization. In the attached patch, I split given tasks up to 1024 tasks.)</div><div><br></div><div>Did you try my patch?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

This is specially true when we have completely different code for MSVC<br>

and for other compilers.<br>

<br>

Cheers,<br>

Rafael<br>

</blockquote></div><br></div></div>