[Openmp-dev] Libomptarget fatal error 1: failure of target construct while offloading is mandatory

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Mon Oct 1 07:13:28 PDT 2018


Okay, that was easy: https://reviews.llvm.org/D52725
This works for me, please give it a try.

Jonas

On 2018-10-01 15:57, Jonas Hahnfeld via Openmp-dev wrote:
> Nope, it's broken: "Warp Illegal Address".
> I can confirm that the program is working correctly when compiled with
> Clang 7.0.0! (@Siegmar please ignore my previous statement about teams
> reductions, I didn't notice that you were attaching the source code in
> your initial post which is only using parallel reductions)
> 
> Interestingly it works with Clang trunk when compiling with
> -fopenmp-cuda-force-full-runtime, so maybe that optimization is not
> correct for reductions?
> 
> Jonas
> 
> On 2018-10-01 15:35, George Rokos via Openmp-dev wrote:
>> Hi Siegmar,
>> 
>>  Something happens during the execution of the second target region.
>> The only thing I can suspect is the reduction. I don't have access to
>> a workstation right now to test this, though...
>> 
>>  Can anyone confirm whether threads reduction is working correctly in
>> the trunk version?
>> 
>>  George
>> 
>> -------------------------
>> 
>> FROM: Siegmar Gross <siegmar.gross at informatik.hs-fulda.de>
>> SENT: 01 October 2018 15:59
>> TO: George Rokos; llvm-openmp-dev
>> SUBJECT: Re: [Openmp-dev] Libomptarget fatal error 1: failure of
>> target construct while offloading is mandatory
>> 
>> Hi George,
>> 
>> thank you very much for your suggestions.
>> 
>>> Apparently your application fails to offload to the GPU. And because
>> offloading
>>> is mandatory (that's the default behavior) the library terminates
>> the application.
>>> 
>>> Can you compile libomptarget in debug mode and run the app with
>>> LIBOMPTARGET_DEBUG=1 to see the debug output? That will help us
>> identify the
>>> problem.
>> 
>> loki introduction 115 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>> dot_prod_accelerator_OpenMP.c
>> loki introduction 116 a.out
>> Number of processors:     24
>> Number of devices:        1
>> Default device:           0
>> Is initial device:        1
>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>> mandatory
>> 
>> loki introduction 117 setenv LIBOMPTARGET_DEBUG 1
>> loki introduction 118 a.out
>> Libomptarget --> Loading RTLs...
>> Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
>> Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so':
>> libomptarget.rtl.ppc64.so: cannot open shared object file: No such
>> file or
>> directory!
>> Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
>> Libomptarget --> Successfully loaded library
>> 'libomptarget.rtl.x86_64.so'!
>> Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting
>> 4 devices!
>> Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
>> Target CUDA RTL --> Start initializing CUDA
>> Libomptarget --> Successfully loaded library
>> 'libomptarget.rtl.cuda.so'!
>> Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1
>> devices!
>> Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
>> Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
>> 
>> libomptarget.rtl.aarch64.so: cannot open shared object file: No such
>> file or
>> directory!
>> Libomptarget --> RTLs loaded!
>> Libomptarget --> Image 0x0000000000602090 is NOT compatible with RTL
>> libomptarget.rtl.x86_64.so!
>> Libomptarget --> Image 0x0000000000602090 is compatible with RTL
>> libomptarget.rtl.cuda.so!
>> Libomptarget --> RTL 0x00000000609f95d0 has index 0!
>> Libomptarget --> Registering image 0x0000000000602090 with RTL
>> libomptarget.rtl.cuda.so!
>> Libomptarget --> Done registering entries!
>> Libomptarget --> Call to omp_get_num_devices returning 1
>> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory
>> (devicew were found)
>> Libomptarget --> Entering target region with entry point
>> 0x00000000004012d0 and
>> device Id -1
>> Libomptarget --> Checking whether device 0 is ready.
>> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
>> Target CUDA RTL --> Getting device 0
>> Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the
>> hard team
>> limit 65536, capping at the hard limit
>> Target CUDA RTL --> Using 1024 CUDA threads per block
>> Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 &
>> warp size 32
>> Target CUDA RTL --> Default number of teams set according to library's
>> default 128
>> Target CUDA RTL --> Default number of threads set according to
>> library's default 128
>> Libomptarget --> Device 0 is ready to use.
>> Target CUDA RTL --> Load data from image 0x0000000000602090
>> Target CUDA RTL --> CUDA module successfully loaded!
>> Target CUDA RTL --> Entry point 0x0000000000000000 maps to
>> __omp_offloading_2b_1890d30_main_l48 (0x0000000060f23320)
>> Target CUDA RTL --> Entry point 0x0000000000000001 maps to
>> __omp_offloading_2b_1890d30_main_l67 (0x0000000060f27c70)
>> Target CUDA RTL --> Sending global device environment data 4 bytes
>> Libomptarget --> Entry  0: Base=0x0000000000613bf0,
>> Begin=0x0000000000613bf0,
>> Size=800000000, Type=0x22
>> Libomptarget --> Entry  1: Base=0x00000000301043f0,
>> Begin=0x00000000301043f0,
>> Size=800000000, Type=0x22
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
>> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0,
>> TgtBegin=0x0000000b08c20000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b08c20000 - is new
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
>> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0,
>> TgtBegin=0x0000000b38720000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b38720000 - is new
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host
>> pointer
>> 0x0000000000613bf0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b38720000 from host
>> pointer
>> 0x00000000301043f0
>> Libomptarget --> Launching target execution
>> __omp_offloading_2b_1890d30_main_l48
>> with pointer 0x0000000060ee2ee0 (index=0).
>> Target CUDA RTL --> Setting CUDA threads per block to default 128
>> Target CUDA RTL --> Using requested number of teams 1
>> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
>> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2ee0
>> successful!
>> Target CUDA RTL --> Kernel execution at 0x0000000060ee2ee0 successful!
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, updated RefCount=1
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b38720000 - is last
>> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b38720000) ->
>> (hst:0x00000000301043f0)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Deleting tgt data 0x0000000b38720000 of size
>> 800000000
>> Libomptarget --> Removing mapping with HstPtrBegin=0x00000000301043f0,
>> 
>> TgtPtrBegin=0x0000000b38720000, Size=800000000
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, updated RefCount=1
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b08c20000 - is last
>> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b08c20000) ->
>> (hst:0x0000000000613bf0)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Deleting tgt data 0x0000000b08c20000 of size
>> 800000000
>> Libomptarget --> Removing mapping with HstPtrBegin=0x0000000000613bf0,
>> 
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000
>> Libomptarget --> Call to omp_get_num_devices returning 1
>> Number of processors:     24
>> Number of devices:        1
>> Default device:           0
>> Is initial device:        1
>> Libomptarget --> Entering target region with entry point
>> 0x00000000004012d1 and
>> device Id -1
>> Libomptarget --> Checking whether device 0 is ready.
>> Libomptarget --> Is the device 0 (local ID 0) initialized? 1
>> Libomptarget --> Device 0 is ready to use.
>> Libomptarget --> Entry  0: Base=0x0000000000613bf0,
>> Begin=0x0000000000613bf0,
>> Size=800000000, Type=0x21
>> Libomptarget --> Entry  1: Base=0x00000000301043f0,
>> Begin=0x00000000301043f0,
>> Size=800000000, Type=0x21
>> Libomptarget --> Entry  2: Base=0x00007fff707a86e8,
>> Begin=0x00007fff707a86e8,
>> Size=8, Type=0x23
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
>> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0,
>> TgtBegin=0x0000000b08c20000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b08c20000 - is new
>> Libomptarget --> Moving 800000000 bytes (hst:0x0000000000613bf0) ->
>> (tgt:0x0000000b08c20000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
>> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0,
>> TgtBegin=0x0000000b38720000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>> 
>> 0x0000000b38720000 - is new
>> Libomptarget --> Moving 800000000 bytes (hst:0x00000000301043f0) ->
>> (tgt:0x0000000b38720000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8,
>> Size=8)...
>> Libomptarget --> Creating new map entry: HstBase=0x00007fff707a86e8,
>> HstBegin=0x00007fff707a86e8, HstEnd=0x00007fff707a86f0,
>> TgtBegin=0x0000000b68220000
>> Libomptarget --> There are 8 bytes allocated at target address
>> 0x0000000b68220000 - is new
>> Libomptarget --> Moving 8 bytes (hst:0x00007fff707a86e8) ->
>> (tgt:0x0000000b68220000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host
>> pointer
>> 0x0000000000613bf0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b38720000 from host
>> pointer
>> 0x00000000301043f0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8,
>> Size=8)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00007fff707a86e8,
>> TgtPtrBegin=0x0000000b68220000, Size=8, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b68220000 from host
>> pointer
>> 0x00007fff707a86e8
>> Libomptarget --> Launching target execution
>> __omp_offloading_2b_1890d30_main_l67
>> with pointer 0x0000000060ee2e70 (index=1).
>> Target CUDA RTL --> Setting CUDA threads per block to default 128
>> Target CUDA RTL --> Using requested number of teams 1
>> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
>> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2e70
>> successful!
>> Target CUDA RTL --> Kernel execution error at 0x0000000060ee2e70!
>> Target CUDA RTL --> CUDA error is: an illegal memory access was
>> encountered
>> Libomptarget --> Executing target region abort target.
>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>> mandatory
>> Libomptarget --> Unloading target library!
>> Libomptarget --> Image 0x0000000000602090 is compatible with RTL
>> 0x00000000609f95d0!
>> Libomptarget --> Unregistered image 0x0000000000602090 from RTL
>> 0x00000000609f95d0!
>> Libomptarget --> Done unregistering images!
>> Libomptarget --> Removing translation table for descriptor
>> 0x0000000000613b90
>> Libomptarget --> Done unregistering library!
>> Target CUDA RTL --> Error when unloading CUDA module
>> Target CUDA RTL --> CUDA error is: an illegal memory access was
>> encountered
>> loki introduction 119
>> 
>> Thank you very much for your help in advance.
>> 
>> Best regards
>> 
>> Siegmar
>> 
>>> 
>>> George
>>> 
>>> 
>> --------------------------------------------------------------------------------
>>> *From:* Openmp-dev <openmp-dev-bounces at lists.llvm.org> on behalf of
>> Siegmar
>>> Gross via Openmp-dev <openmp-dev at lists.llvm.org>
>>> *Sent:* 01 October 2018 13:26
>>> *To:* llvm-openmp-dev
>>> *Subject:* [Openmp-dev] Libomptarget fatal error 1: failure of
>> target construct
>>> while offloading is mandatory
>>> Hi,
>>> 
>>> today I've installed llvm-trunk. Unfortunately, I get an error for
>> one of my
>>> programs.
>>> 
>>> 
>>> loki introduction 110 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>>> dot_prod_accelerator_OpenMP.c
>>> loki introduction 111 a.out
>>> Number of processors:     24
>>> Number of devices:        1
>>> Default device:           0
>>> Is initial device:        1
>>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>>> mandatory
>>> 
>>> loki introduction 112 setenv OMP_DEFAULT_DEVICE 1
>>> loki introduction 113 a.out
>>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>>> mandatory
>>> 
>>> loki introduction 114 clang -v
>>> clang version 8.0.0 (trunk 343447)
>>> Target: x86_64-unknown-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/local/llvm-trunk/bin
>>> Found candidate GCC installation:
>> /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Candidate multilib: .;@m64
>>> Candidate multilib: 32;@m32
>>> Selected multilib: .;@m64
>>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>>> loki introduction 115
>>> 
>>> 
>>> 
>>> The program works fine with llvm-7.0.0.
>>> 
>>> loki introduction 125 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>>> dot_prod_accelerator_OpenMP.c
>>> loki introduction 126 a.out
>>> Number of processors:     24
>>> Number of devices:        1
>>> Default device:           0
>>> Is initial device:        1
>>> sum = 6.000000e+08
>>> 
>>> loki introduction 127 setenv OMP_DEFAULT_DEVICE 1
>>> loki introduction 128 a.out
>>> Number of processors:     24
>>> Number of devices:        1
>>> Default device:           1
>>> Is initial device:        1
>>> sum = 6.000000e+08
>>> 
>>> loki introduction 129 clang -v
>>> clang version 7.0.0 (tags/RELEASE_700/final)
>>> Target: x86_64-unknown-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/local/llvm-7.0.0/bin
>>> Found candidate GCC installation:
>> /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Candidate multilib: .;@m64
>>> Candidate multilib: 32;@m32
>>> Selected multilib: .;@m64
>>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>>> loki introduction 130
>>> 
>>> 
>>> Hopefully somebody can fix the problem. Do you need anything else to
>> locate the
>>> error? Thank you very much for any help in advance.
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list