[Openmp-dev] omp_set_num_threads() with target region

Churbanov, Andrey via Openmp-dev openmp-dev at lists.llvm.org
Tue Oct 29 05:30:24 PDT 2019


Hi Simone,

Couple of notes:


1.       omp_set_num_threads() is not supposed to work for limiting number of threads inside target or teams constructs. It only works for parallel regions in the same (implicit) task, and the target construct is supposed to initialize its own separate execution environment in implementation-specific manner. Currently OpenMP specification does not have any api to control number of teams and number of threads in each team; the only standard control is thread_limit clause on teams construct.  As an extension, the runtime library honors OMP_NUM_THREADS environment variable to limit the number of threads in teams construct executed on host device.  And in about a month it is expected new release of the OpenMP specification (Technical Report 8) which will introduce new environment variables OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT and corresponding APIs omp_set_num_teams and omp_set_teams_thread_limit for controlling number of teams and threads for each team.  But these are not yet implemented in the runtime library, and I'd guess they still won't work for "target teams" construct, only for "teams" without "target".


2.       Even if you are able to limit the number of threads to 1 (and obviously limit the number of teams to 1 as well), you won't get expected result, because the lastprivate clause is not implemented (or the implementation has bugs) for "target teams distribute parallel for" construct on host device in clang.  Maybe compiler team knows more details here, as the runtime library seems to provide needed support.  We can discuss this issue separately if needed.

Regards,
Andrey

From: Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] On Behalf Of Simone Atzeni via Openmp-dev
Sent: Tuesday, October 29, 2019 7:43 AM
To: openmp-dev at lists.llvm.org
Subject: [Openmp-dev] omp_set_num_threads() with target region

Hi,

It looks like that "omp_set_num_threads()" does not affect the number of threads in a parallel region that belongs to a combined target construct.

Consider this code:

#include <omp.h>
#include <stdio.h>

int main()
{
    int lp = 0;

    omp_set_num_threads(1);
#pragma omp target teams distribute parallel for lastprivate(lp)
    for(int i = 1; i < 10; i++) {
        printf("tid: %d\n", omp_get_thread_num());
        if(i == 1) {
            lp = 0;
        }
        lp++;
    }
    printf("lp: %d\n", lp);

    return 0;
}

This code when compiled with `clang -fopenmp test.c` returns a value of `lp` equal to 0, and the parallel region is executed by 9 threads in this case.
Shouldn't the `omp_set_num_threads(1)` set only one thread for the parallel region? Therefore, the expected result should be "9"?

Could this be a runtime bug? Otherwise, please help me understand this.

Thanks!
Simone
________________________________
This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20191029/3a8e4b56/attachment.html>


More information about the Openmp-dev mailing list