[Openmp-dev] Libomptarget fatal error 1: failure of target construct while offloading is mandatory
Jonas Hahnfeld via Openmp-dev
openmp-dev at lists.llvm.org
Mon Oct 1 07:13:28 PDT 2018
Okay, that was easy: https://reviews.llvm.org/D52725
This works for me, please give it a try.
Jonas
On 2018-10-01 15:57, Jonas Hahnfeld via Openmp-dev wrote:
> Nope, it's broken: "Warp Illegal Address".
> I can confirm that the program is working correctly when compiled with
> Clang 7.0.0! (@Siegmar please ignore my previous statement about teams
> reductions, I didn't notice that you were attaching the source code in
> your initial post which is only using parallel reductions)
>
> Interestingly it works with Clang trunk when compiling with
> -fopenmp-cuda-force-full-runtime, so maybe that optimization is not
> correct for reductions?
>
> Jonas
>
> On 2018-10-01 15:35, George Rokos via Openmp-dev wrote:
>> Hi Siegmar,
>>
>> Something happens during the execution of the second target region.
>> The only thing I can suspect is the reduction. I don't have access to
>> a workstation right now to test this, though...
>>
>> Can anyone confirm whether threads reduction is working correctly in
>> the trunk version?
>>
>> George
>>
>> -------------------------
>>
>> FROM: Siegmar Gross <siegmar.gross at informatik.hs-fulda.de>
>> SENT: 01 October 2018 15:59
>> TO: George Rokos; llvm-openmp-dev
>> SUBJECT: Re: [Openmp-dev] Libomptarget fatal error 1: failure of
>> target construct while offloading is mandatory
>>
>> Hi George,
>>
>> thank you very much for your suggestions.
>>
>>> Apparently your application fails to offload to the GPU. And because
>> offloading
>>> is mandatory (that's the default behavior) the library terminates
>> the application.
>>>
>>> Can you compile libomptarget in debug mode and run the app with
>>> LIBOMPTARGET_DEBUG=1 to see the debug output? That will help us
>> identify the
>>> problem.
>>
>> loki introduction 115 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>> dot_prod_accelerator_OpenMP.c
>> loki introduction 116 a.out
>> Number of processors: 24
>> Number of devices: 1
>> Default device: 0
>> Is initial device: 1
>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>> mandatory
>>
>> loki introduction 117 setenv LIBOMPTARGET_DEBUG 1
>> loki introduction 118 a.out
>> Libomptarget --> Loading RTLs...
>> Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
>> Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so':
>> libomptarget.rtl.ppc64.so: cannot open shared object file: No such
>> file or
>> directory!
>> Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
>> Libomptarget --> Successfully loaded library
>> 'libomptarget.rtl.x86_64.so'!
>> Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting
>> 4 devices!
>> Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
>> Target CUDA RTL --> Start initializing CUDA
>> Libomptarget --> Successfully loaded library
>> 'libomptarget.rtl.cuda.so'!
>> Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1
>> devices!
>> Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
>> Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
>>
>> libomptarget.rtl.aarch64.so: cannot open shared object file: No such
>> file or
>> directory!
>> Libomptarget --> RTLs loaded!
>> Libomptarget --> Image 0x0000000000602090 is NOT compatible with RTL
>> libomptarget.rtl.x86_64.so!
>> Libomptarget --> Image 0x0000000000602090 is compatible with RTL
>> libomptarget.rtl.cuda.so!
>> Libomptarget --> RTL 0x00000000609f95d0 has index 0!
>> Libomptarget --> Registering image 0x0000000000602090 with RTL
>> libomptarget.rtl.cuda.so!
>> Libomptarget --> Done registering entries!
>> Libomptarget --> Call to omp_get_num_devices returning 1
>> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory
>> (devicew were found)
>> Libomptarget --> Entering target region with entry point
>> 0x00000000004012d0 and
>> device Id -1
>> Libomptarget --> Checking whether device 0 is ready.
>> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
>> Target CUDA RTL --> Getting device 0
>> Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the
>> hard team
>> limit 65536, capping at the hard limit
>> Target CUDA RTL --> Using 1024 CUDA threads per block
>> Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 &
>> warp size 32
>> Target CUDA RTL --> Default number of teams set according to library's
>> default 128
>> Target CUDA RTL --> Default number of threads set according to
>> library's default 128
>> Libomptarget --> Device 0 is ready to use.
>> Target CUDA RTL --> Load data from image 0x0000000000602090
>> Target CUDA RTL --> CUDA module successfully loaded!
>> Target CUDA RTL --> Entry point 0x0000000000000000 maps to
>> __omp_offloading_2b_1890d30_main_l48 (0x0000000060f23320)
>> Target CUDA RTL --> Entry point 0x0000000000000001 maps to
>> __omp_offloading_2b_1890d30_main_l67 (0x0000000060f27c70)
>> Target CUDA RTL --> Sending global device environment data 4 bytes
>> Libomptarget --> Entry 0: Base=0x0000000000613bf0,
>> Begin=0x0000000000613bf0,
>> Size=800000000, Type=0x22
>> Libomptarget --> Entry 1: Base=0x00000000301043f0,
>> Begin=0x00000000301043f0,
>> Size=800000000, Type=0x22
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
>> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0,
>> TgtBegin=0x0000000b08c20000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b08c20000 - is new
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
>> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0,
>> TgtBegin=0x0000000b38720000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b38720000 - is new
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host
>> pointer
>> 0x0000000000613bf0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b38720000 from host
>> pointer
>> 0x00000000301043f0
>> Libomptarget --> Launching target execution
>> __omp_offloading_2b_1890d30_main_l48
>> with pointer 0x0000000060ee2ee0 (index=0).
>> Target CUDA RTL --> Setting CUDA threads per block to default 128
>> Target CUDA RTL --> Using requested number of teams 1
>> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
>> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2ee0
>> successful!
>> Target CUDA RTL --> Kernel execution at 0x0000000060ee2ee0 successful!
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, updated RefCount=1
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b38720000 - is last
>> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b38720000) ->
>> (hst:0x00000000301043f0)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Deleting tgt data 0x0000000b38720000 of size
>> 800000000
>> Libomptarget --> Removing mapping with HstPtrBegin=0x00000000301043f0,
>>
>> TgtPtrBegin=0x0000000b38720000, Size=800000000
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, updated RefCount=1
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b08c20000 - is last
>> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b08c20000) ->
>> (hst:0x0000000000613bf0)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Deleting tgt data 0x0000000b08c20000 of size
>> 800000000
>> Libomptarget --> Removing mapping with HstPtrBegin=0x0000000000613bf0,
>>
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000
>> Libomptarget --> Call to omp_get_num_devices returning 1
>> Number of processors: 24
>> Number of devices: 1
>> Default device: 0
>> Is initial device: 1
>> Libomptarget --> Entering target region with entry point
>> 0x00000000004012d1 and
>> device Id -1
>> Libomptarget --> Checking whether device 0 is ready.
>> Libomptarget --> Is the device 0 (local ID 0) initialized? 1
>> Libomptarget --> Device 0 is ready to use.
>> Libomptarget --> Entry 0: Base=0x0000000000613bf0,
>> Begin=0x0000000000613bf0,
>> Size=800000000, Type=0x21
>> Libomptarget --> Entry 1: Base=0x00000000301043f0,
>> Begin=0x00000000301043f0,
>> Size=800000000, Type=0x21
>> Libomptarget --> Entry 2: Base=0x00007fff707a86e8,
>> Begin=0x00007fff707a86e8,
>> Size=8, Type=0x23
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
>> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0,
>> TgtBegin=0x0000000b08c20000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b08c20000 - is new
>> Libomptarget --> Moving 800000000 bytes (hst:0x0000000000613bf0) ->
>> (tgt:0x0000000b08c20000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
>> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0,
>> TgtBegin=0x0000000b38720000
>> Libomptarget --> There are 800000000 bytes allocated at target address
>>
>> 0x0000000b38720000 - is new
>> Libomptarget --> Moving 800000000 bytes (hst:0x00000000301043f0) ->
>> (tgt:0x0000000b38720000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8,
>> Size=8)...
>> Libomptarget --> Creating new map entry: HstBase=0x00007fff707a86e8,
>> HstBegin=0x00007fff707a86e8, HstEnd=0x00007fff707a86f0,
>> TgtBegin=0x0000000b68220000
>> Libomptarget --> There are 8 bytes allocated at target address
>> 0x0000000b68220000 - is new
>> Libomptarget --> Moving 8 bytes (hst:0x00007fff707a86e8) ->
>> (tgt:0x0000000b68220000)
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
>> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host
>> pointer
>> 0x0000000000613bf0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
>> Size=800000000)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
>> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b38720000 from host
>> pointer
>> 0x00000000301043f0
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8,
>> Size=8)...
>> Libomptarget --> Mapping exists with HstPtrBegin=0x00007fff707a86e8,
>> TgtPtrBegin=0x0000000b68220000, Size=8, RefCount=1
>> Libomptarget --> Obtained target argument 0x0000000b68220000 from host
>> pointer
>> 0x00007fff707a86e8
>> Libomptarget --> Launching target execution
>> __omp_offloading_2b_1890d30_main_l67
>> with pointer 0x0000000060ee2e70 (index=1).
>> Target CUDA RTL --> Setting CUDA threads per block to default 128
>> Target CUDA RTL --> Using requested number of teams 1
>> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
>> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2e70
>> successful!
>> Target CUDA RTL --> Kernel execution error at 0x0000000060ee2e70!
>> Target CUDA RTL --> CUDA error is: an illegal memory access was
>> encountered
>> Libomptarget --> Executing target region abort target.
>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>> mandatory
>> Libomptarget --> Unloading target library!
>> Libomptarget --> Image 0x0000000000602090 is compatible with RTL
>> 0x00000000609f95d0!
>> Libomptarget --> Unregistered image 0x0000000000602090 from RTL
>> 0x00000000609f95d0!
>> Libomptarget --> Done unregistering images!
>> Libomptarget --> Removing translation table for descriptor
>> 0x0000000000613b90
>> Libomptarget --> Done unregistering library!
>> Target CUDA RTL --> Error when unloading CUDA module
>> Target CUDA RTL --> CUDA error is: an illegal memory access was
>> encountered
>> loki introduction 119
>>
>> Thank you very much for your help in advance.
>>
>> Best regards
>>
>> Siegmar
>>
>>>
>>> George
>>>
>>>
>> --------------------------------------------------------------------------------
>>> *From:* Openmp-dev <openmp-dev-bounces at lists.llvm.org> on behalf of
>> Siegmar
>>> Gross via Openmp-dev <openmp-dev at lists.llvm.org>
>>> *Sent:* 01 October 2018 13:26
>>> *To:* llvm-openmp-dev
>>> *Subject:* [Openmp-dev] Libomptarget fatal error 1: failure of
>> target construct
>>> while offloading is mandatory
>>> Hi,
>>>
>>> today I've installed llvm-trunk. Unfortunately, I get an error for
>> one of my
>>> programs.
>>>
>>>
>>> loki introduction 110 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>>> dot_prod_accelerator_OpenMP.c
>>> loki introduction 111 a.out
>>> Number of processors: 24
>>> Number of devices: 1
>>> Default device: 0
>>> Is initial device: 1
>>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>>> mandatory
>>>
>>> loki introduction 112 setenv OMP_DEFAULT_DEVICE 1
>>> loki introduction 113 a.out
>>> Libomptarget fatal error 1: failure of target construct while
>> offloading is
>>> mandatory
>>>
>>> loki introduction 114 clang -v
>>> clang version 8.0.0 (trunk 343447)
>>> Target: x86_64-unknown-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/local/llvm-trunk/bin
>>> Found candidate GCC installation:
>> /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Candidate multilib: .;@m64
>>> Candidate multilib: 32;@m32
>>> Selected multilib: .;@m64
>>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>>> loki introduction 115
>>>
>>>
>>>
>>> The program works fine with llvm-7.0.0.
>>>
>>> loki introduction 125 clang -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda
>>> dot_prod_accelerator_OpenMP.c
>>> loki introduction 126 a.out
>>> Number of processors: 24
>>> Number of devices: 1
>>> Default device: 0
>>> Is initial device: 1
>>> sum = 6.000000e+08
>>>
>>> loki introduction 127 setenv OMP_DEFAULT_DEVICE 1
>>> loki introduction 128 a.out
>>> Number of processors: 24
>>> Number of devices: 1
>>> Default device: 1
>>> Is initial device: 1
>>> sum = 6.000000e+08
>>>
>>> loki introduction 129 clang -v
>>> clang version 7.0.0 (tags/RELEASE_700/final)
>>> Target: x86_64-unknown-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/local/llvm-7.0.0/bin
>>> Found candidate GCC installation:
>> /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>>> Candidate multilib: .;@m64
>>> Candidate multilib: 32;@m32
>>> Selected multilib: .;@m64
>>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>>> loki introduction 130
>>>
>>>
>>> Hopefully somebody can fix the problem. Do you need anything else to
>> locate the
>>> error? Thank you very much for any help in advance.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
More information about the Openmp-dev
mailing list