[Openmp-dev] How to apply custom pass to OpenMP offloading code

Johannes Doerfert via Openmp-dev openmp-dev at lists.llvm.org
Thu Nov 19 13:09:13 PST 2020


If you open the IR module you generate you'd see that it contains both 
host and device code.
Short of manually applying all steps the driver takes to build a offload 
binary, you cannot
run custom passes via opt. Load them in via a plugin, that's the proper 
way to add custom passes.

~ Johannes


On 11/19/20 11:18 AM, Alok Mishra via Openmp-dev wrote:
> I'm trying to develop and apply my custom pass to an OpenMP target
> offloading code. But when using 'opt' I face the following error:
>    opt: <stdin>:29:1: error: expected top-level entity
>    source_filename = "test.cpp"
>
> I receive this error message even if I do not use my pass and just try to
> use opt with -O1.
>
> Sample code I tried to build:
>    $ cat test.cpp
>    int main()
>    {
>    #pragma omp target teams distribute parallel for
>      for(int i=0; i<1000;i++);
>      return 0;
>    }
>
> Command used to build the code and run
>    $ clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target
> -march=sm_70 test.cpp
>    $ nvprof ./a.out
>    ==85306== NVPROF is profiling process 85306, command: ./a.out
>    ==85306== Profiling application: ./a.out
>    ==85306== Profiling result:
>              Type  Time(%)      Time     Calls       Avg       Min       Max
>   Name
>     GPU activities:   91.79%  30.752us         1  30.752us  30.752us
>   30.752us  __omp_offloading_2c_715c4b_main_l3
>                      4.68%  1.5680us         1  1.5680us  1.5680us  1.5680us
>   [CUDA memcpy DtoH]
>                      3.53%  1.1840us         1  1.1840us  1.1840us  1.1840us
>   [CUDA memcpy HtoD]
>          API calls:   80.54%  362.46ms         1  362.46ms  362.46ms
>   362.46ms  cuDevicePrimaryCtxRetain
>                     17.76%  79.946ms         1  79.946ms  79.946ms  79.946ms
>   cuDevicePrimaryCtxRelease
>                      1.00%  4.4941ms         1  4.4941ms  4.4941ms  4.4941ms
>   cuModuleLoadDataEx
>                      0.54%  2.4175ms         1  2.4175ms  2.4175ms  2.4175ms
>   cuModuleUnload
>                      0.10%  446.26us        32  13.945us  2.0710us  174.72us
>   cuStreamCreate
>                      0.02%  111.17us        32  3.4730us  2.9480us  13.457us
>   cuStreamDestroy
>                      0.01%  64.949us         1  64.949us  64.949us  64.949us
>   cuStreamSynchronize
>                      0.01%  44.065us         1  44.065us  44.065us  44.065us
>   cuMemcpyDtoH
>                      0.01%  27.451us         1  27.451us  27.451us  27.451us
>   cuLaunchKernel
>                      0.00%  12.617us         1  12.617us  12.617us  12.617us
>   cuDeviceGetPCIBusId
>                      0.00%  9.4680us         1  9.4680us  9.4680us  9.4680us
>   cuMemcpyHtoD
>                      0.00%  5.9570us         1  5.9570us  5.9570us  5.9570us
>   cuModuleGetFunction
>                      0.00%  5.1610us         2  2.5800us  1.1090us  4.0520us
>   cuModuleGetGlobal
>                      0.00%  5.1190us         6     853ns     277ns  1.9740us
>   cuCtxSetCurrent
>                      0.00%  3.5270us         6     587ns     214ns  1.2310us
>   cuDeviceGetAttribute
>                      0.00%  3.4210us         3  1.1400us     465ns  2.4020us
>   cuDeviceGetCount
>                      0.00%  3.3210us         2  1.6600us  1.5970us  1.7240us
>   cuDeviceGet
>                      0.00%  1.6230us         1  1.6230us  1.6230us  1.6230us
>   cuFuncGetAttribute
>                      0.00%  1.1600us         1  1.1600us  1.1600us  1.1600us
>   cuDevicePrimaryCtxGetState
>                      0.00%     685ns         1     685ns     685ns     685ns
>   cuDevicePrimaryCtxSetFlags
>                      0.00%     346ns         1     346ns     346ns     346ns
>   cuCtxGetDevice
>
> This shows that my Clang and OpenMP are built properly and target
> offloading works.
>
> Next I'm trying to apply any pass to this code, so I convert the code into
> LLVM-IR and then apply -O1 to it using opt.
>    $ clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target
> -march=sm_70 -emit-llvm -S test.cpp
>    $ opt -O1 < test.ll
>    opt: <stdin>:891:1: error: expected top-level entity
>    source_filename = "test.cpp"
>    ^
>
> I'm able to use opt for any non-target offloading code. I get this error
> only when I use OpenMP target offloading.
> Please advise what am I doing wrong here.
>
> --
> Thank You.
> Regards,
> Alok
> *'FOR THE GREATER GOOD'*
>
>
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list