[Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal memory access was encountered

Itaru Kitayama via Openmp-dev openmp-dev at lists.llvm.org
Sat Jan 18 22:26:06 PST 2020


Tom,

Below builds, but fails at run time:

#include <stdio.h>

class AbsBase {
public:
        virtual int f() = 0;
virtual void map_in() = 0;
virtual void map_out() = 0;
};

class Derived : public AbsBase {
private:
        int *arr = new int[100];
public:
        int f() { return arr[0]; }
        void fillarray() {
                arr[0] = 1234;
        }
virtual void map_in() {
#pragma omp target enter data map(to:this[0:1], this->arr[0:1])
}
virtual void map_out() {
#pragma omp target exit data map(from:this[0:1], this->arr[0:1])
}
};

int main() {
        AbsBase *absBase = new Derived();
static_cast<Derived*>(absBase)->fillarray();
static_cast<Derived*>(absBase)->map_in();

#pragma omp target parallel for
        for (int i=0;i<10;i++) {
                Derived d1(*static_cast<Derived*>(absBase));
                printf("arr[0] is %d\n", d1.f());
        }
absBase->map_out();
}

What's wrong with my code?

[kitayama1 at juronc11 pcp0151]$ ./a.out
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4
devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so':
libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or
directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1
devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
libomptarget.rtl.aarch64.so: cannot open shared object file: No such file
or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010001a90 is NOT compatible with RTL
libomptarget.rtl.ppc64.so!
Libomptarget --> Image 0x0000000010001a90 is compatible with RTL
libomptarget.rtl.cuda.so!
Libomptarget --> RTL 0x0000010000a6d6d0 has index 0!
Libomptarget --> Registering image 0x0000000010001a90 with RTL
libomptarget.rtl.cuda.so!
Libomptarget --> Done registering entries!
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices
were found)
Libomptarget --> Entering data begin region for device -1 with 3 mappings
Libomptarget --> Use default device id 0
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard
team limit 65536, capping at the hard limit
Target CUDA RTL --> Using 1024 CUDA threads per block
Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp
size 32
Target CUDA RTL --> Default number of teams set according to library's
default 128
Target CUDA RTL --> Default number of threads set according to library's
default 128
Libomptarget --> Device 0 is ready to use.
Target CUDA RTL --> Load data from image 0x0000000010001a90
Target CUDA RTL --> CUDA module successfully loaded!
Target CUDA RTL --> Entry point 0x0000000000000000 maps to
__omp_offloading_30_809bddc5_main_l31 (0x00001100003a99d0)
Target CUDA RTL --> Sending global device environment data 4 bytes
Libomptarget --> Entry  0: Base=0x0000010000ac1260,
Begin=0x0000010000ac1268, Size=8, Type=0x20
Libomptarget --> Entry  1: Base=0x0000010000ac1260,
Begin=0x0000010000ac1260, Size=16, Type=0x1000000000001
Libomptarget --> Entry  2: Base=0x0000010000ac1268,
Begin=0x0000010000ac1280, Size=4, Type=0x1000000000011
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1268,
Size=8)...
Libomptarget --> Creating new map entry: HstBase=0x0000010000ac1260,
HstBegin=0x0000010000ac1268, HstEnd=0x0000010000ac1270,
TgtBegin=0x0000110048600000
Libomptarget --> There are 8 bytes allocated at target address
0x0000110048600000 - is new
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1260,
Size=16)...
Libomptarget --> WARNING: Pointer is not mapped but section extends into
already mapped data
Libomptarget --> Explicit extension of mapping is not allowed.
Libomptarget --> Call to getOrAllocTgtPtr returned null pointer (device
failure or illegal mapping).
Libomptarget --> There are 16 bytes allocated at target address
0x0000000000000000 - is new
Libomptarget --> Moving 16 bytes (hst:0x0000010000ac1260) ->
(tgt:0x0000000000000000)
Target CUDA RTL --> Error when copying data from host to device. Pointers:
host = 0x0000010000ac1260, device = 0x0000000000000000, size = 16
Target CUDA RTL --> CUDA error is: invalid argument
Libomptarget --> Copying data to device failed.
Libomptarget fatal error 1: failure of target construct while offloading is
mandatory
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000010001a90 is compatible with RTL
0x0000010000a6d6d0!
Libomptarget --> Unregistered image 0x0000000010001a90 from RTL
0x0000010000a6d6d0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor
0x000000001007dff8
Libomptarget --> Done unregistering library!

On Sat, Jan 18, 2020 at 11:49 AM Scogland, Tom <scogland1 at llnl.gov> wrote:

> That's the map of "*this" talking, which we made possible in a newer
> standard than is implemented.  If you change it over to mapping either this
> by an array section or a reference to *this it should work.  The general
> pattern is to use that kind of mapping method to get the correct static
> type at the point of the map.
>
> -Tom
> ------------------------------
> *From:* Itaru Kitayama <itaru.kitayama at gmail.com>
> *Sent:* Friday, January 17, 2020 5:42 PM
> *To:* Doerfert, Johannes <jdoerfert at anl.gov>
> *Cc:* Alexey Bataev <a.bataev at hotmail.com>; openmp-dev <
> openmp-dev at lists.llvm.org>; Scogland, Tom <scogland1 at llnl.gov>
> *Subject:* Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal
> memory access was encountered
>
> Johannes,
> Tested with Trunk on POWER8:
>
> test5.cpp:20:40: error: expected expression containing only member
> accesses and/or array sections based on named variables
>                 #pragma omp target enter data map(to:*this,
> this->arr[0:100])
>                                                      ^~~~~
> test5.cpp:23:40: error: expected expression containing only member
> accesses and/or array sections based on named variables
>                 #pragma omp target enter data map(to:*this,
> this->arr[0:100])
>
>
> On Sat, Jan 18, 2020 at 4:50 AM Doerfert, Johannes <jdoerfert at anl.gov>
> wrote:
>
> Check the attached email from Tom for a way to make this work. Let me know
> what you think
>
>
>
> ---------------------------------------
> Johannes Doerfert
> Researcher
>
> Argonne National Laboratory
> Lemont, IL 60439, USA
>
> jdoerfert at anl.gov
>
> ________________________________________
> From: Alexey Bataev <a.bataev at hotmail.com>
> Sent: Friday, January 17, 2020 03:33
> To: Itaru Kitayama
> Cc: Doerfert, Johannes; openmp-dev
> Subject: Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal
> memory access was encountered
>
> The compiler does not work the way you want. It is not an issue of OpenMP
> but of C++ itself.
>
> Best regards,
> Alexey Bataev
>
> 16 янв. 2020 г., в 23:20, Itaru Kitayama <itaru.kitayama at gmail.com>
> написал(а):
>
> 
> #include <stdio.h>
>
> class AbsBase {
> public:
>         virtual int f() = 0;
> };
>
> class Derived : public AbsBase {
> private:
>         //int *arr = new int[100];
>         int arr[100];
> public:
>         int f() { return arr[0]; }
>         void fillarray() {
>                 arr[0] = 1234;
>         }
> };
>
> int main() {
>         AbsBase *absBase = new Derived();
> Derived *p = static_cast<Derived*>(absBase);
>         p->fillarray();
> #pragma omp target parallel for map(to: p[0:1])
>         for (int i=0;i<10;i++) {
>                 //Derived d1(*static_cast<Derived*>(absBase));
>                 Derived d1(*p);
>                 printf("arr[0] is %d\n", d1.f());
>         }
> }
>
> Above gives me what I wanted to see, but I would like to avoid doing a
> cast on a pointer
> to an abstract base class, but the instantiation is done by the derived
> class.
>
>
> On Fri, Jan 17, 2020 at 6:07 PM Alexey Bataev <a.bataev at hotmail.com
> <mailto:a.bataev at hotmail.com>> wrote:
> AbsBase is a pointer to a base class? Then not, compiler is not aware that
> it is a pointer to the derived class and copies the array as an array of
> base class, nit derived one.
>
> Best regards,
> Alexey Bataev
>
> 16 янв. 2020 г., в 22:58, Itaru Kitayama <itaru.kitayama at gmail.com<mailto:
> itaru.kitayama at gmail.com>> написал(а):
>
> 
> If I change class Derived's private data to an array of 100 integers, it
> executes, but
> the first element of the array is reported as 0, instead of 1234.
> Shouldn't bitwise copy
> work when mapping the pointer to AbsBase class like an array of one?
>
> On Wed, Jan 15, 2020 at 2:32 PM Alexey Bataev <a.bataev at hotmail.com
> <mailto:a.bataev at hotmail.com>> wrote:
> And it should fail since you're mapping non trivially copyable type.
>
> Best regards,
> Alexey Bataev
>
> 14 янв. 2020 г., в 19:14, Itaru Kitayama <itaru.kitayama at gmail.com<mailto:
> itaru.kitayama at gmail.com>> написал(а):
>
> 
> This piece of C++ program execution fails at run time.
>
> #include <stdio.h>
>
> class AbsBase {
> public:
>         virtual int f() = 0;
> };
>
> class Derived : public AbsBase {
> private:
>         int *arr = new int[100];
> public:
>         int f() { return arr[0]; }
>         void fillarray() {
>                 arr[0] = 1234;
>         }
> };
>
> int main() {
>         AbsBase *absBase = new Derived();
>         static_cast<Derived*>(absBase)->fillarray();
> #pragma omp target parallel for map(to: absBase[0:1])
>         for (int i=0;i<10;i++) {
>                 Derived d1(*static_cast<Derived*>(absBase));
>                 printf("arr[0] is %d\n", d1.f());
>         }
> }
>
> On Sun, Jan 12, 2020 at 1:45 PM Doerfert, Johannes <jdoerfert at anl.gov
> <mailto:jdoerfert at anl.gov>> wrote:
> On 01/12, Itaru Kitayama wrote:
> > Do you guys have a timeframe for that feature
> > Implemented?
>
> I do not and I don't know anyone who will drive this right now.
>
> As mentioned before, you should be able to "move/copy-create" the
> elements on the device in order to use virtual functions.
>
>
> > On Sun, Jan 12, 2020 at 12:51 Doerfert, Johannes <jdoerfert at anl.gov
> <mailto:jdoerfert at anl.gov>> wrote:
> >
> > > On 01/11, Alexey Bataev via Openmp-dev wrote:
> > > > Virtual functions are not supported.
> > >
> > > Not yet ;).
> > >
> > > We'll get it with 5.1 so we might actually implement it soon. Till
> then,
> > > you have to create the object on the device you call the virtual
> > > function.
> > >
>
> --
>
> Johannes Doerfert
> Researcher
>
> Argonne National Laboratory
> Lemont, IL 60439, USA
>
> jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200119/1f074bca/attachment-0001.html>


More information about the Openmp-dev mailing list