[Openmp-dev] questions about loop scheduling

Thu Jan 17 08:22:26 PST 2019

1. Not sure I got the question right, but if you want to get real chunk bounds inside the OpenMP runtime library, I am afraid it is impossible. Because clang goes to the runtime with normalized loop bounds, thus the runtime does not know actual values, it only knows normalized iteration space.

2. You are right, the dynamic scheduling uses single atomic operation per chunk.  To access if it is expensive or not, try to distribute 1B of chunks, and you will find that the scheduling is pretty expensive comparing to static schedule for example.  Working with a dozen of chunks the overhead will always be small (relatively).

Regards,
Andrey

From: Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] On Behalf Of yongkee kwon via Openmp-dev
Sent: Sunday, January 13, 2019 5:51 AM
To: openmp-dev at lists.llvm.org
Subject: [Openmp-dev] questions about loop scheduling

Hi everyone,

I'd like to ask and clarify questions that I came up with while I am working on performance tracing with various loop schedules.

1. For static/chunked schedule with decreasing loop, I've observed something strange with kmp(c)_for_static_init(). I checked loop iteration lower and upper bound, a size of chunk, and a size of loop increment at the beginning and end of kmp_static_init by enabling debug (KMP_D_DEBUG) and they looked like as if I set up increasing loop while it was working well as I expected.

To be more specific, let me give you a very simple example as follows:

omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 4)
for( int i = 31; i >=0; i--) { LOOP_BODY };

In this case, thread0 would get the chunk of iteration between 31 and 28 and between 15 and 12, thread1 between 27 and 24 and between 11 and 8, and so forth. However I saw kmp(c)_for_static_init for thread0 had (LB:0, UB:3, ST:16) as if it were an increasing loop starting at 0 or there should be a way for runtime to interpret a task of LB 0 as a loop iteration with the index of 31.

The reason why I care about this is because I'd like to track each work chunk by scheduling it arbitrarily and I was considering LB (lower bound) as a unique ID for each work chunk. However in this case, I don't know how I extract the correct bound associate with loop iteration number of scheduled chunk. I am also looking into an implementation of Clang codegen library for OpenMP to see if what's going on there but I feel I am a bit lost.

Please help me find what I missed and correct me if I went anything wrong.

2. To measure an overhead of dynamic/chunked scheduling, I've added callbacks at the beginning and end of __kmp_dispatch_next_algorithm() in kmp_dispatch.cpp and monitored an elapsed time in between. I was told dynamic scheduling could be pretty expensive and thus I would expect a contention around (a sort of conceptually) centralized work queue would be expensive as we have more chunks and threads as discussed in the paper [REF1].

However it seems it's pretty acceptable with moderate number of threads (16-64) and a number of chunks per thread (16-64) unless I try too extreme case such as chunk of 1. I took a quick look at the implementation and it seemed to make sense since what we paid for dynamic scheduling as compared to static scheduling is an atomic operation (test_then_inc_acq) and there was not such a centralized queue substantiated.

I'd like to just double check what I've observed make sense to those who contributed performance tune for the runtime implementation. Any comment on your experience would be greatly appreciated!

FYI, I am working on Clang 7 and LLVM OpenMP 7.0.

[REF1] OpenMP task scheduling strategies for multicore NUMA systems

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190117/425e8c54/attachment.html>