[Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal memory access was encountered

Sun Feb 9 22:00:07 PST 2020

All,

The similar code looks like below:
#include <stdio.h>
#include <iostream>

class AbsBase {
public:
        virtual int f() = 0;
        virtual void map_in() = 0;
        virtual void map_out() = 0;
};

class Derived : public AbsBase {
private:
        int a,b,c;
        //int *arr = new int[100];
public:
        virtual int f() { return a; }
        int f2() { return a; }
        void fillarray() {
                //arr[0] = 1234;
                a =56;
        }
        virtual void map_in() {
                #pragma omp target enter data map(to:this[0:1])
                //#pragma omp target enter data map(to:this->arr[0:100])
        }
        virtual void map_out() {
                #pragma omp target exit data map(from:this[0:1])
                //#pragma omp target exit data map(from:this->arr[0:1])
        }
};

int main() {
        AbsBase *absBase[2];
        absBase[0] = new Derived();
        std::cout << "pointer is " << absBase[0] << std::endl;
        absBase[1] = new Derived();
        static_cast<Derived*>(absBase[0])->fillarray();
        static_cast<Derived*>(absBase[0])->map_in();
        static_cast<Derived*>(absBase[1])->map_in();
        std::cout << sizeof(Derived) << std::endl;
        int a1, a2, a3;
#pragma omp target parallel for map(to: a1,a2,a3) is_device_ptr(absBase)
        for (int i=0;i<10;i++) {
                //Derived d1(*static_cast<Derived*>(absBase));
                //printf("arr[0] is %d\n", d1.f2());
                printf("%d\n", static_cast<Derived*>(absBase[0])->f2());

        }
        //static_cast<Derived*>(absBase)->map_out();
}

ends up an illegal memory error at run time. Can you spot immediately what
is wrong with this code?
In the debugging log, mapping in for two objects seem to have succeeded
onto the device, in areas
separately clearly.

On Tue, Jan 21, 2020 at 5:30 PM Itaru Kitayama <itaru.kitayama at gmail.com>
wrote:

> By mapping this pointer first and then followed by the mapping of the
> array, the sample code worked as expected.
> Now I am wondering in the map clause, the order of objects appear in the
> list respected by the compiler?
>
> On Mon, Jan 20, 2020 at 10:01 AM Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
>
>> I have updated the code, as a dynamically allocated array is in practice,
>> much important:
>> #include <stdio.h>
>> #include <iostream>
>>
>> class AbsBase {
>> public:
>>         virtual int f() = 0;
>> virtual void map_in() = 0;
>> virtual void map_out() = 0;
>> };
>>
>> class Derived : public AbsBase {
>> private:
>>         int *arr = new int[100];
>>         //int arr[100];
>> public:
>>         int f() { return arr[0]; }
>>         void fillarray() {
>>                 arr[0] = 1234;
>>         }
>> virtual void map_in() {
>> #pragma omp target enter data map(to:this[0:1], this->arr[0:1])
>> }
>> virtual void map_out() {
>> #pragma omp target exit data map(from:this[0:1], this->arr[0:1])
>> }
>> };
>>
>> int main() {
>>         AbsBase *absBase = new Derived();
>> static_cast<Derived*>(absBase)->fillarray();
>> static_cast<Derived*>(absBase)->map_in();
>> #pragma omp target parallel for
>>         for (int i=0;i<10;i++) {
>>                 Derived d1(*static_cast<Derived*>(absBase));
>>                 printf("arr[0] is %d\n", d1.f());
>>         }
>> static_cast<Derived*>(absBase)->map_out();
>> }
>>
>> Here are the relevant errors:
>> [...]
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010023511268,
>> Size=8)...
>> Libomptarget --> Creating new map entry: HstBase=0x0000010023511260,
>> HstBegin=0x0000010023511268, HstEnd=0x0000010023511270,
>> TgtBegin=0x0000110048600000
>> Libomptarget --> There are 8 bytes allocated at target address
>> 0x0000110048600000 - is new
>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010023511260,
>> Size=16)...
>> Libomptarget --> WARNING: Pointer is not mapped but section extends into
>> already mapped data
>> Libomptarget --> Explicit extension of mapping is not allowed.
>> Libomptarget --> Call to getOrAllocTgtPtr returned null pointer (device
>> failure or illegal mapping).
>> Libomptarget --> There are 16 bytes allocated at target address
>> 0x0000000000000000 - is new
>> Libomptarget --> Moving 16 bytes (hst:0x0000010023511260) ->
>> (tgt:0x0000000000000000)
>> Target CUDA RTL --> Error when copying data from host to device.
>> Pointers: host = 0x0000010023511260, device = 0x0000000000000000, size = 16
>> Target CUDA RTL --> CUDA error is: invalid argument
>> Libomptarget --> Copying data to device failed.
>> Libomptarget fatal error 1: failure of target construct while offloading
>> is mandatory
>> [...]
>>
>> On Sun, Jan 19, 2020 at 3:26 PM Itaru Kitayama <itaru.kitayama at gmail.com>
>> wrote:
>>
>>> Tom,
>>>
>>> Below builds, but fails at run time:
>>>
>>> #include <stdio.h>
>>>
>>> class AbsBase {
>>> public:
>>>         virtual int f() = 0;
>>> virtual void map_in() = 0;
>>> virtual void map_out() = 0;
>>> };
>>>
>>> class Derived : public AbsBase {
>>> private:
>>>         int *arr = new int[100];
>>> public:
>>>         int f() { return arr[0]; }
>>>         void fillarray() {
>>>                 arr[0] = 1234;
>>>         }
>>> virtual void map_in() {
>>> #pragma omp target enter data map(to:this[0:1], this->arr[0:1])
>>> }
>>> virtual void map_out() {
>>> #pragma omp target exit data map(from:this[0:1], this->arr[0:1])
>>> }
>>> };
>>>
>>> int main() {
>>>         AbsBase *absBase = new Derived();
>>> static_cast<Derived*>(absBase)->fillarray();
>>> static_cast<Derived*>(absBase)->map_in();
>>>
>>> #pragma omp target parallel for
>>>         for (int i=0;i<10;i++) {
>>>                 Derived d1(*static_cast<Derived*>(absBase));
>>>                 printf("arr[0] is %d\n", d1.f());
>>>         }
>>> absBase->map_out();
>>> }
>>>
>>> What's wrong with my code?
>>>
>>> [kitayama1 at juronc11 pcp0151]$ ./a.out
>>> Libomptarget --> Loading RTLs...
>>> Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
>>> Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so
>>> '!
>>> Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4
>>> devices!
>>> Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
>>> Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so':
>>> libomptarget.rtl.x86_64.so: cannot open shared object file: No such
>>> file or directory!
>>> Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
>>> Target CUDA RTL --> Start initializing CUDA
>>> Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
>>> Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1
>>> devices!
>>> Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
>>> Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
>>> libomptarget.rtl.aarch64.so: cannot open shared object file: No such
>>> file or directory!
>>> Libomptarget --> RTLs loaded!
>>> Libomptarget --> Image 0x0000000010001a90 is NOT compatible with RTL
>>> libomptarget.rtl.ppc64.so!
>>> Libomptarget --> Image 0x0000000010001a90 is compatible with RTL
>>> libomptarget.rtl.cuda.so!
>>> Libomptarget --> RTL 0x0000010000a6d6d0 has index 0!
>>> Libomptarget --> Registering image 0x0000000010001a90 with RTL
>>> libomptarget.rtl.cuda.so!
>>> Libomptarget --> Done registering entries!
>>> Libomptarget --> Call to omp_get_num_devices returning 1
>>> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices
>>> were found)
>>> Libomptarget --> Entering data begin region for device -1 with 3 mappings
>>> Libomptarget --> Use default device id 0
>>> Libomptarget --> Checking whether device 0 is ready.
>>> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
>>> Target CUDA RTL --> Init requires flags to 1
>>> Target CUDA RTL --> Getting device 0
>>> Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard
>>> team limit 65536, capping at the hard limit
>>> Target CUDA RTL --> Using 1024 CUDA threads per block
>>> Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp
>>> size 32
>>> Target CUDA RTL --> Default number of teams set according to library's
>>> default 128
>>> Target CUDA RTL --> Default number of threads set according to library's
>>> default 128
>>> Libomptarget --> Device 0 is ready to use.
>>> Target CUDA RTL --> Load data from image 0x0000000010001a90
>>> Target CUDA RTL --> CUDA module successfully loaded!
>>> Target CUDA RTL --> Entry point 0x0000000000000000 maps to
>>> __omp_offloading_30_809bddc5_main_l31 (0x00001100003a99d0)
>>> Target CUDA RTL --> Sending global device environment data 4 bytes
>>> Libomptarget --> Entry  0: Base=0x0000010000ac1260,
>>> Begin=0x0000010000ac1268, Size=8, Type=0x20
>>> Libomptarget --> Entry  1: Base=0x0000010000ac1260,
>>> Begin=0x0000010000ac1260, Size=16, Type=0x1000000000001
>>> Libomptarget --> Entry  2: Base=0x0000010000ac1268,
>>> Begin=0x0000010000ac1280, Size=4, Type=0x1000000000011
>>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1268,
>>> Size=8)...
>>> Libomptarget --> Creating new map entry: HstBase=0x0000010000ac1260,
>>> HstBegin=0x0000010000ac1268, HstEnd=0x0000010000ac1270,
>>> TgtBegin=0x0000110048600000
>>> Libomptarget --> There are 8 bytes allocated at target address
>>> 0x0000110048600000 - is new
>>> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1260,
>>> Size=16)...
>>> Libomptarget --> WARNING: Pointer is not mapped but section extends into
>>> already mapped data
>>> Libomptarget --> Explicit extension of mapping is not allowed.
>>> Libomptarget --> Call to getOrAllocTgtPtr returned null pointer (device
>>> failure or illegal mapping).
>>> Libomptarget --> There are 16 bytes allocated at target address
>>> 0x0000000000000000 - is new
>>> Libomptarget --> Moving 16 bytes (hst:0x0000010000ac1260) ->
>>> (tgt:0x0000000000000000)
>>> Target CUDA RTL --> Error when copying data from host to device.
>>> Pointers: host = 0x0000010000ac1260, device = 0x0000000000000000, size = 16
>>> Target CUDA RTL --> CUDA error is: invalid argument
>>> Libomptarget --> Copying data to device failed.
>>> Libomptarget fatal error 1: failure of target construct while offloading
>>> is mandatory
>>> Libomptarget --> Unloading target library!
>>> Libomptarget --> Image 0x0000000010001a90 is compatible with RTL
>>> 0x0000010000a6d6d0!
>>> Libomptarget --> Unregistered image 0x0000000010001a90 from RTL
>>> 0x0000010000a6d6d0!
>>> Libomptarget --> Done unregistering images!
>>> Libomptarget --> Removing translation table for descriptor
>>> 0x000000001007dff8
>>> Libomptarget --> Done unregistering library!
>>>
>>> On Sat, Jan 18, 2020 at 11:49 AM Scogland, Tom <scogland1 at llnl.gov>
>>> wrote:
>>>
>>>> That's the map of "*this" talking, which we made possible in a newer
>>>> standard than is implemented.  If you change it over to mapping either this
>>>> by an array section or a reference to *this it should work.  The general
>>>> pattern is to use that kind of mapping method to get the correct static
>>>> type at the point of the map.
>>>>
>>>> -Tom
>>>> ------------------------------
>>>> *From:* Itaru Kitayama <itaru.kitayama at gmail.com>
>>>> *Sent:* Friday, January 17, 2020 5:42 PM
>>>> *To:* Doerfert, Johannes <jdoerfert at anl.gov>
>>>> *Cc:* Alexey Bataev <a.bataev at hotmail.com>; openmp-dev <
>>>> openmp-dev at lists.llvm.org>; Scogland, Tom <scogland1 at llnl.gov>
>>>> *Subject:* Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an
>>>> illegal memory access was encountered
>>>>
>>>> Johannes,
>>>> Tested with Trunk on POWER8:
>>>>
>>>> test5.cpp:20:40: error: expected expression containing only member
>>>> accesses and/or array sections based on named variables
>>>>                 #pragma omp target enter data map(to:*this,
>>>> this->arr[0:100])
>>>>                                                      ^~~~~
>>>> test5.cpp:23:40: error: expected expression containing only member
>>>> accesses and/or array sections based on named variables
>>>>                 #pragma omp target enter data map(to:*this,
>>>> this->arr[0:100])
>>>>
>>>>
>>>> On Sat, Jan 18, 2020 at 4:50 AM Doerfert, Johannes <jdoerfert at anl.gov>
>>>> wrote:
>>>>
>>>> Check the attached email from Tom for a way to make this work. Let me
>>>> know what you think
>>>>
>>>>
>>>>
>>>> ---------------------------------------
>>>> Johannes Doerfert
>>>> Researcher
>>>>
>>>> Argonne National Laboratory
>>>> Lemont, IL 60439, USA
>>>>
>>>> jdoerfert at anl.gov
>>>>
>>>> ________________________________________
>>>> From: Alexey Bataev <a.bataev at hotmail.com>
>>>> Sent: Friday, January 17, 2020 03:33
>>>> To: Itaru Kitayama
>>>> Cc: Doerfert, Johannes; openmp-dev
>>>> Subject: Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal
>>>> memory access was encountered
>>>>
>>>> The compiler does not work the way you want. It is not an issue of
>>>> OpenMP but of C++ itself.
>>>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>> 16 янв. 2020 г., в 23:20, Itaru Kitayama <itaru.kitayama at gmail.com>
>>>> написал(а):
>>>>
>>>> 
>>>> #include <stdio.h>
>>>>
>>>> class AbsBase {
>>>> public:
>>>>         virtual int f() = 0;
>>>> };
>>>>
>>>> class Derived : public AbsBase {
>>>> private:
>>>>         //int *arr = new int[100];
>>>>         int arr[100];
>>>> public:
>>>>         int f() { return arr[0]; }
>>>>         void fillarray() {
>>>>                 arr[0] = 1234;
>>>>         }
>>>> };
>>>>
>>>> int main() {
>>>>         AbsBase *absBase = new Derived();
>>>> Derived *p = static_cast<Derived*>(absBase);
>>>>         p->fillarray();
>>>> #pragma omp target parallel for map(to: p[0:1])
>>>>         for (int i=0;i<10;i++) {
>>>>                 //Derived d1(*static_cast<Derived*>(absBase));
>>>>                 Derived d1(*p);
>>>>                 printf("arr[0] is %d\n", d1.f());
>>>>         }
>>>> }
>>>>
>>>> Above gives me what I wanted to see, but I would like to avoid doing a
>>>> cast on a pointer
>>>> to an abstract base class, but the instantiation is done by the derived
>>>> class.
>>>>
>>>>
>>>> On Fri, Jan 17, 2020 at 6:07 PM Alexey Bataev <a.bataev at hotmail.com
>>>> <mailto:a.bataev at hotmail.com>> wrote:
>>>> AbsBase is a pointer to a base class? Then not, compiler is not aware
>>>> that it is a pointer to the derived class and copies the array as an array
>>>> of base class, nit derived one.
>>>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>> 16 янв. 2020 г., в 22:58, Itaru Kitayama <itaru.kitayama at gmail.com
>>>> <mailto:itaru.kitayama at gmail.com>> написал(а):
>>>>
>>>> 
>>>> If I change class Derived's private data to an array of 100 integers,
>>>> it executes, but
>>>> the first element of the array is reported as 0, instead of 1234.
>>>> Shouldn't bitwise copy
>>>> work when mapping the pointer to AbsBase class like an array of one?
>>>>
>>>> On Wed, Jan 15, 2020 at 2:32 PM Alexey Bataev <a.bataev at hotmail.com
>>>> <mailto:a.bataev at hotmail.com>> wrote:
>>>> And it should fail since you're mapping non trivially copyable type.
>>>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>> 14 янв. 2020 г., в 19:14, Itaru Kitayama <itaru.kitayama at gmail.com
>>>> <mailto:itaru.kitayama at gmail.com>> написал(а):
>>>>
>>>> 
>>>> This piece of C++ program execution fails at run time.
>>>>
>>>> #include <stdio.h>
>>>>
>>>> class AbsBase {
>>>> public:
>>>>         virtual int f() = 0;
>>>> };
>>>>
>>>> class Derived : public AbsBase {
>>>> private:
>>>>         int *arr = new int[100];
>>>> public:
>>>>         int f() { return arr[0]; }
>>>>         void fillarray() {
>>>>                 arr[0] = 1234;
>>>>         }
>>>> };
>>>>
>>>> int main() {
>>>>         AbsBase *absBase = new Derived();
>>>>         static_cast<Derived*>(absBase)->fillarray();
>>>> #pragma omp target parallel for map(to: absBase[0:1])
>>>>         for (int i=0;i<10;i++) {
>>>>                 Derived d1(*static_cast<Derived*>(absBase));
>>>>                 printf("arr[0] is %d\n", d1.f());
>>>>         }
>>>> }
>>>>
>>>> On Sun, Jan 12, 2020 at 1:45 PM Doerfert, Johannes <jdoerfert at anl.gov
>>>> <mailto:jdoerfert at anl.gov>> wrote:
>>>> On 01/12, Itaru Kitayama wrote:
>>>> > Do you guys have a timeframe for that feature
>>>> > Implemented?
>>>>
>>>> I do not and I don't know anyone who will drive this right now.
>>>>
>>>> As mentioned before, you should be able to "move/copy-create" the
>>>> elements on the device in order to use virtual functions.
>>>>
>>>>
>>>> > On Sun, Jan 12, 2020 at 12:51 Doerfert, Johannes <jdoerfert at anl.gov
>>>> <mailto:jdoerfert at anl.gov>> wrote:
>>>> >
>>>> > > On 01/11, Alexey Bataev via Openmp-dev wrote:
>>>> > > > Virtual functions are not supported.
>>>> > >
>>>> > > Not yet ;).
>>>> > >
>>>> > > We'll get it with 5.1 so we might actually implement it soon. Till
>>>> then,
>>>> > > you have to create the object on the device you call the virtual
>>>> > > function.
>>>> > >
>>>>
>>>> --
>>>>
>>>> Johannes Doerfert
>>>> Researcher
>>>>
>>>> Argonne National Laboratory
>>>> Lemont, IL 60439, USA
>>>>
>>>> jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200210/9f4f5ad1/attachment-0001.html>