[Openmp-dev] Libomptarget fatal error 1: failure of target construct while offloading is mandatory
    Siegmar Gross via Openmp-dev 
    openmp-dev at lists.llvm.org
       
    Mon Oct  1 05:59:57 PDT 2018
    
    
  
Hi George,
thank you very much for your suggestions.
> Apparently your application fails to offload to the GPU. And because offloading 
> is mandatory (that's the default behavior) the library terminates the application.
> 
> Can you compile libomptarget in debug mode and run the app with 
> LIBOMPTARGET_DEBUG=1 to see the debug output? That will help us identify the 
> problem.
loki introduction 115 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda 
dot_prod_accelerator_OpenMP.c
loki introduction 116 a.out
Number of processors:     24
Number of devices:        1
Default device:           0
Is initial device:        1
Libomptarget fatal error 1: failure of target construct while offloading is 
mandatory
loki introduction 117 setenv LIBOMPTARGET_DEBUG 1
loki introduction 118 a.out
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so': 
libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or 
directory!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': 
libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or 
directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000000602090 is NOT compatible with RTL 
libomptarget.rtl.x86_64.so!
Libomptarget --> Image 0x0000000000602090 is compatible with RTL 
libomptarget.rtl.cuda.so!
Libomptarget --> RTL 0x00000000609f95d0 has index 0!
Libomptarget --> Registering image 0x0000000000602090 with RTL 
libomptarget.rtl.cuda.so!
Libomptarget --> Done registering entries!
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devicew were found)
Libomptarget --> Entering target region with entry point 0x00000000004012d0 and 
device Id -1
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard team 
limit 65536, capping at the hard limit
Target CUDA RTL --> Using 1024 CUDA threads per block
Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp size 32
Target CUDA RTL --> Default number of teams set according to library's default 128
Target CUDA RTL --> Default number of threads set according to library's default 128
Libomptarget --> Device 0 is ready to use.
Target CUDA RTL --> Load data from image 0x0000000000602090
Target CUDA RTL --> CUDA module successfully loaded!
Target CUDA RTL --> Entry point 0x0000000000000000 maps to 
__omp_offloading_2b_1890d30_main_l48 (0x0000000060f23320)
Target CUDA RTL --> Entry point 0x0000000000000001 maps to 
__omp_offloading_2b_1890d30_main_l67 (0x0000000060f27c70)
Target CUDA RTL --> Sending global device environment data 4 bytes
Libomptarget --> Entry  0: Base=0x0000000000613bf0, Begin=0x0000000000613bf0, 
Size=800000000, Type=0x22
Libomptarget --> Entry  1: Base=0x00000000301043f0, Begin=0x00000000301043f0, 
Size=800000000, Type=0x22
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0, 
HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0, TgtBegin=0x0000000b08c20000
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b08c20000 - is new
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0, 
HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0, TgtBegin=0x0000000b38720000
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b38720000 - is new
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0, 
TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
Libomptarget --> Obtained target argument 0x0000000b08c20000 from host pointer 
0x0000000000613bf0
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0, 
TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
Libomptarget --> Obtained target argument 0x0000000b38720000 from host pointer 
0x00000000301043f0
Libomptarget --> Launching target execution __omp_offloading_2b_1890d30_main_l48 
with pointer 0x0000000060ee2ee0 (index=0).
Target CUDA RTL --> Setting CUDA threads per block to default 128
Target CUDA RTL --> Using requested number of teams 1
Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
Target CUDA RTL --> Launch of entry point at 0x0000000060ee2ee0 successful!
Target CUDA RTL --> Kernel execution at 0x0000000060ee2ee0 successful!
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0, 
TgtPtrBegin=0x0000000b38720000, Size=800000000, updated RefCount=1
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b38720000 - is last
Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b38720000) -> 
(hst:0x00000000301043f0)
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Deleting tgt data 0x0000000b38720000 of size 800000000
Libomptarget --> Removing mapping with HstPtrBegin=0x00000000301043f0, 
TgtPtrBegin=0x0000000b38720000, Size=800000000
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0, 
TgtPtrBegin=0x0000000b08c20000, Size=800000000, updated RefCount=1
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b08c20000 - is last
Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b08c20000) -> 
(hst:0x0000000000613bf0)
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Deleting tgt data 0x0000000b08c20000 of size 800000000
Libomptarget --> Removing mapping with HstPtrBegin=0x0000000000613bf0, 
TgtPtrBegin=0x0000000b08c20000, Size=800000000
Libomptarget --> Call to omp_get_num_devices returning 1
Number of processors:     24
Number of devices:        1
Default device:           0
Is initial device:        1
Libomptarget --> Entering target region with entry point 0x00000000004012d1 and 
device Id -1
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 1
Libomptarget --> Device 0 is ready to use.
Libomptarget --> Entry  0: Base=0x0000000000613bf0, Begin=0x0000000000613bf0, 
Size=800000000, Type=0x21
Libomptarget --> Entry  1: Base=0x00000000301043f0, Begin=0x00000000301043f0, 
Size=800000000, Type=0x21
Libomptarget --> Entry  2: Base=0x00007fff707a86e8, Begin=0x00007fff707a86e8, 
Size=8, Type=0x23
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0, 
HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0, TgtBegin=0x0000000b08c20000
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b08c20000 - is new
Libomptarget --> Moving 800000000 bytes (hst:0x0000000000613bf0) -> 
(tgt:0x0000000b08c20000)
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0, 
HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0, TgtBegin=0x0000000b38720000
Libomptarget --> There are 800000000 bytes allocated at target address 
0x0000000b38720000 - is new
Libomptarget --> Moving 800000000 bytes (hst:0x00000000301043f0) -> 
(tgt:0x0000000b38720000)
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8, Size=8)...
Libomptarget --> Creating new map entry: HstBase=0x00007fff707a86e8, 
HstBegin=0x00007fff707a86e8, HstEnd=0x00007fff707a86f0, TgtBegin=0x0000000b68220000
Libomptarget --> There are 8 bytes allocated at target address 
0x0000000b68220000 - is new
Libomptarget --> Moving 8 bytes (hst:0x00007fff707a86e8) -> (tgt:0x0000000b68220000)
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0, 
TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
Libomptarget --> Obtained target argument 0x0000000b08c20000 from host pointer 
0x0000000000613bf0
Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0, 
Size=800000000)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0, 
TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
Libomptarget --> Obtained target argument 0x0000000b38720000 from host pointer 
0x00000000301043f0
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8, Size=8)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00007fff707a86e8, 
TgtPtrBegin=0x0000000b68220000, Size=8, RefCount=1
Libomptarget --> Obtained target argument 0x0000000b68220000 from host pointer 
0x00007fff707a86e8
Libomptarget --> Launching target execution __omp_offloading_2b_1890d30_main_l67 
with pointer 0x0000000060ee2e70 (index=1).
Target CUDA RTL --> Setting CUDA threads per block to default 128
Target CUDA RTL --> Using requested number of teams 1
Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
Target CUDA RTL --> Launch of entry point at 0x0000000060ee2e70 successful!
Target CUDA RTL --> Kernel execution error at 0x0000000060ee2e70!
Target CUDA RTL --> CUDA error is: an illegal memory access was encountered
Libomptarget --> Executing target region abort target.
Libomptarget fatal error 1: failure of target construct while offloading is 
mandatory
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000000602090 is compatible with RTL 0x00000000609f95d0!
Libomptarget --> Unregistered image 0x0000000000602090 from RTL 0x00000000609f95d0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x0000000000613b90
Libomptarget --> Done unregistering library!
Target CUDA RTL --> Error when unloading CUDA module
Target CUDA RTL --> CUDA error is: an illegal memory access was encountered
loki introduction 119
Thank you very much for your help in advance.
Best regards
Siegmar
> 
> George
> 
> --------------------------------------------------------------------------------
> *From:* Openmp-dev <openmp-dev-bounces at lists.llvm.org> on behalf of Siegmar 
> Gross via Openmp-dev <openmp-dev at lists.llvm.org>
> *Sent:* 01 October 2018 13:26
> *To:* llvm-openmp-dev
> *Subject:* [Openmp-dev] Libomptarget fatal error 1: failure of target construct 
> while offloading is mandatory
> Hi,
> 
> today I've installed llvm-trunk. Unfortunately, I get an error for one of my
> programs.
> 
> 
> loki introduction 110 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
> dot_prod_accelerator_OpenMP.c
> loki introduction 111 a.out
> Number of processors:     24
> Number of devices:        1
> Default device:           0
> Is initial device:        1
> Libomptarget fatal error 1: failure of target construct while offloading is
> mandatory
> 
> loki introduction 112 setenv OMP_DEFAULT_DEVICE 1
> loki introduction 113 a.out
> Libomptarget fatal error 1: failure of target construct while offloading is
> mandatory
> 
> loki introduction 114 clang -v
> clang version 8.0.0 (trunk 343447)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/local/llvm-trunk/bin
> Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: .;@m64
> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
> loki introduction 115
> 
> 
> 
> The program works fine with llvm-7.0.0.
> 
> loki introduction 125 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
> dot_prod_accelerator_OpenMP.c
> loki introduction 126 a.out
> Number of processors:     24
> Number of devices:        1
> Default device:           0
> Is initial device:        1
> sum = 6.000000e+08
> 
> loki introduction 127 setenv OMP_DEFAULT_DEVICE 1
> loki introduction 128 a.out
> Number of processors:     24
> Number of devices:        1
> Default device:           1
> Is initial device:        1
> sum = 6.000000e+08
> 
> loki introduction 129 clang -v
> clang version 7.0.0 (tags/RELEASE_700/final)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/local/llvm-7.0.0/bin
> Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: .;@m64
> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
> loki introduction 130
> 
> 
> Hopefully somebody can fix the problem. Do you need anything else to locate the
> error? Thank you very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
    
    
More information about the Openmp-dev
mailing list