[Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal memory access was encountered

Alexey Bataev via Openmp-dev openmp-dev at lists.llvm.org
Sun Jan 19 17:16:20 PST 2020


Itaru, try #pragma omp target parallel for is_device_ptr(absBase)

Best regards,
Alexey Bataev

19 янв. 2020 г., в 20:01, Itaru Kitayama <itaru.kitayama at gmail.com> написал(а):


I have updated the code, as a dynamically allocated array is in practice, much important:
#include <stdio.h>
#include <iostream>

class AbsBase {
public:
        virtual int f() = 0;
virtual void map_in() = 0;
virtual void map_out() = 0;
};

class Derived : public AbsBase {
private:
        int *arr = new int[100];
        //int arr[100];
public:
        int f() { return arr[0]; }
        void fillarray() {
                arr[0] = 1234;
        }
virtual void map_in() {
#pragma omp target enter data map(to:this[0:1], this->arr[0:1])
}
virtual void map_out() {
#pragma omp target exit data map(from:this[0:1], this->arr[0:1])
}
};

int main() {
        AbsBase *absBase = new Derived();
static_cast<Derived*>(absBase)->fillarray();
static_cast<Derived*>(absBase)->map_in();
#pragma omp target parallel for
        for (int i=0;i<10;i++) {
                Derived d1(*static_cast<Derived*>(absBase));
                printf("arr[0] is %d\n", d1.f());
        }
static_cast<Derived*>(absBase)->map_out();
}

Here are the relevant errors:
[...]
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010023511268, Size=8)...
Libomptarget --> Creating new map entry: HstBase=0x0000010023511260, HstBegin=0x0000010023511268, HstEnd=0x0000010023511270, TgtBegin=0x0000110048600000
Libomptarget --> There are 8 bytes allocated at target address 0x0000110048600000 - is new
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010023511260, Size=16)...
Libomptarget --> WARNING: Pointer is not mapped but section extends into already mapped data
Libomptarget --> Explicit extension of mapping is not allowed.
Libomptarget --> Call to getOrAllocTgtPtr returned null pointer (device failure or illegal mapping).
Libomptarget --> There are 16 bytes allocated at target address 0x0000000000000000 - is new
Libomptarget --> Moving 16 bytes (hst:0x0000010023511260) -> (tgt:0x0000000000000000)
Target CUDA RTL --> Error when copying data from host to device. Pointers: host = 0x0000010023511260, device = 0x0000000000000000, size = 16
Target CUDA RTL --> CUDA error is: invalid argument
Libomptarget --> Copying data to device failed.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory
[...]

On Sun, Jan 19, 2020 at 3:26 PM Itaru Kitayama <itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com>> wrote:
Tom,

Below builds, but fails at run time:

#include <stdio.h>

class AbsBase {
public:
        virtual int f() = 0;
virtual void map_in() = 0;
virtual void map_out() = 0;
};

class Derived : public AbsBase {
private:
        int *arr = new int[100];
public:
        int f() { return arr[0]; }
        void fillarray() {
                arr[0] = 1234;
        }
virtual void map_in() {
#pragma omp target enter data map(to:this[0:1], this->arr[0:1])
}
virtual void map_out() {
#pragma omp target exit data map(from:this[0:1], this->arr[0:1])
}
};

int main() {
        AbsBase *absBase = new Derived();
static_cast<Derived*>(absBase)->fillarray();
static_cast<Derived*>(absBase)->map_in();

#pragma omp target parallel for
        for (int i=0;i<10;i++) {
                Derived d1(*static_cast<Derived*>(absBase));
                printf("arr[0] is %d\n", d1.f());
        }
absBase->map_out();
}

What's wrong with my code?

[kitayama1 at juronc11 pcp0151]$ ./a.out
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so<http://libomptarget.rtl.ppc64.so>'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so<http://libomptarget.rtl.ppc64.so>'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so<http://libomptarget.rtl.ppc64.so> supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so<http://libomptarget.rtl.x86_64.so>'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so<http://libomptarget.rtl.x86_64.so>': libomptarget.rtl.x86_64.so<http://libomptarget.rtl.x86_64.so>: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so<http://libomptarget.rtl.cuda.so>'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so<http://libomptarget.rtl.cuda.so>'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so<http://libomptarget.rtl.cuda.so> supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so<http://libomptarget.rtl.aarch64.so>'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so<http://libomptarget.rtl.aarch64.so>': libomptarget.rtl.aarch64.so<http://libomptarget.rtl.aarch64.so>: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010001a90 is NOT compatible with RTL libomptarget.rtl.ppc64.so<http://libomptarget.rtl.ppc64.so>!
Libomptarget --> Image 0x0000000010001a90 is compatible with RTL libomptarget.rtl.cuda.so<http://libomptarget.rtl.cuda.so>!
Libomptarget --> RTL 0x0000010000a6d6d0 has index 0!
Libomptarget --> Registering image 0x0000000010001a90 with RTL libomptarget.rtl.cuda.so<http://libomptarget.rtl.cuda.so>!
Libomptarget --> Done registering entries!
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)
Libomptarget --> Entering data begin region for device -1 with 3 mappings
Libomptarget --> Use default device id 0
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard team limit 65536, capping at the hard limit
Target CUDA RTL --> Using 1024 CUDA threads per block
Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp size 32
Target CUDA RTL --> Default number of teams set according to library's default 128
Target CUDA RTL --> Default number of threads set according to library's default 128
Libomptarget --> Device 0 is ready to use.
Target CUDA RTL --> Load data from image 0x0000000010001a90
Target CUDA RTL --> CUDA module successfully loaded!
Target CUDA RTL --> Entry point 0x0000000000000000 maps to __omp_offloading_30_809bddc5_main_l31 (0x00001100003a99d0)
Target CUDA RTL --> Sending global device environment data 4 bytes
Libomptarget --> Entry  0: Base=0x0000010000ac1260, Begin=0x0000010000ac1268, Size=8, Type=0x20
Libomptarget --> Entry  1: Base=0x0000010000ac1260, Begin=0x0000010000ac1260, Size=16, Type=0x1000000000001
Libomptarget --> Entry  2: Base=0x0000010000ac1268, Begin=0x0000010000ac1280, Size=4, Type=0x1000000000011
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1268, Size=8)...
Libomptarget --> Creating new map entry: HstBase=0x0000010000ac1260, HstBegin=0x0000010000ac1268, HstEnd=0x0000010000ac1270, TgtBegin=0x0000110048600000
Libomptarget --> There are 8 bytes allocated at target address 0x0000110048600000 - is new
Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010000ac1260, Size=16)...
Libomptarget --> WARNING: Pointer is not mapped but section extends into already mapped data
Libomptarget --> Explicit extension of mapping is not allowed.
Libomptarget --> Call to getOrAllocTgtPtr returned null pointer (device failure or illegal mapping).
Libomptarget --> There are 16 bytes allocated at target address 0x0000000000000000 - is new
Libomptarget --> Moving 16 bytes (hst:0x0000010000ac1260) -> (tgt:0x0000000000000000)
Target CUDA RTL --> Error when copying data from host to device. Pointers: host = 0x0000010000ac1260, device = 0x0000000000000000, size = 16
Target CUDA RTL --> CUDA error is: invalid argument
Libomptarget --> Copying data to device failed.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000010001a90 is compatible with RTL 0x0000010000a6d6d0!
Libomptarget --> Unregistered image 0x0000000010001a90 from RTL 0x0000010000a6d6d0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x000000001007dff8
Libomptarget --> Done unregistering library!

On Sat, Jan 18, 2020 at 11:49 AM Scogland, Tom <scogland1 at llnl.gov<mailto:scogland1 at llnl.gov>> wrote:
That's the map of "*this" talking, which we made possible in a newer standard than is implemented.  If you change it over to mapping either this by an array section or a reference to *this it should work.  The general pattern is to use that kind of mapping method to get the correct static type at the point of the map.

-Tom
________________________________
From: Itaru Kitayama <itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com>>
Sent: Friday, January 17, 2020 5:42 PM
To: Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>>
Cc: Alexey Bataev <a.bataev at hotmail.com<mailto:a.bataev at hotmail.com>>; openmp-dev <openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>>; Scogland, Tom <scogland1 at llnl.gov<mailto:scogland1 at llnl.gov>>
Subject: Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal memory access was encountered

Johannes,
Tested with Trunk on POWER8:

test5.cpp:20:40: error: expected expression containing only member accesses and/or array sections based on named variables
                #pragma omp target enter data map(to:*this, this->arr[0:100])
                                                     ^~~~~
test5.cpp:23:40: error: expected expression containing only member accesses and/or array sections based on named variables
                #pragma omp target enter data map(to:*this, this->arr[0:100])


On Sat, Jan 18, 2020 at 4:50 AM Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>> wrote:
Check the attached email from Tom for a way to make this work. Let me know what you think



---------------------------------------
Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>

________________________________________
From: Alexey Bataev <a.bataev at hotmail.com<mailto:a.bataev at hotmail.com>>
Sent: Friday, January 17, 2020 03:33
To: Itaru Kitayama
Cc: Doerfert, Johannes; openmp-dev
Subject: Re: [Openmp-dev] Target CUDA RTL --> CUDA error is: an illegal memory access was encountered

The compiler does not work the way you want. It is not an issue of OpenMP but of C++ itself.

Best regards,
Alexey Bataev

16 янв. 2020 г., в 23:20, Itaru Kitayama <itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com>> написал(а):


#include <stdio.h>

class AbsBase {
public:
        virtual int f() = 0;
};

class Derived : public AbsBase {
private:
        //int *arr = new int[100];
        int arr[100];
public:
        int f() { return arr[0]; }
        void fillarray() {
                arr[0] = 1234;
        }
};

int main() {
        AbsBase *absBase = new Derived();
Derived *p = static_cast<Derived*>(absBase);
        p->fillarray();
#pragma omp target parallel for map(to: p[0:1])
        for (int i=0;i<10;i++) {
                //Derived d1(*static_cast<Derived*>(absBase));
                Derived d1(*p);
                printf("arr[0] is %d\n", d1.f());
        }
}

Above gives me what I wanted to see, but I would like to avoid doing a cast on a pointer
to an abstract base class, but the instantiation is done by the derived class.


On Fri, Jan 17, 2020 at 6:07 PM Alexey Bataev <a.bataev at hotmail.com<mailto:a.bataev at hotmail.com><mailto:a.bataev at hotmail.com<mailto:a.bataev at hotmail.com>>> wrote:
AbsBase is a pointer to a base class? Then not, compiler is not aware that it is a pointer to the derived class and copies the array as an array of base class, nit derived one.

Best regards,
Alexey Bataev

16 янв. 2020 г., в 22:58, Itaru Kitayama <itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com><mailto:itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com>>> написал(а):


If I change class Derived's private data to an array of 100 integers, it executes, but
the first element of the array is reported as 0, instead of 1234. Shouldn't bitwise copy
work when mapping the pointer to AbsBase class like an array of one?

On Wed, Jan 15, 2020 at 2:32 PM Alexey Bataev <a.bataev at hotmail.com<mailto:a.bataev at hotmail.com><mailto:a.bataev at hotmail.com<mailto:a.bataev at hotmail.com>>> wrote:
And it should fail since you're mapping non trivially copyable type.

Best regards,
Alexey Bataev

14 янв. 2020 г., в 19:14, Itaru Kitayama <itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com><mailto:itaru.kitayama at gmail.com<mailto:itaru.kitayama at gmail.com>>> написал(а):


This piece of C++ program execution fails at run time.

#include <stdio.h>

class AbsBase {
public:
        virtual int f() = 0;
};

class Derived : public AbsBase {
private:
        int *arr = new int[100];
public:
        int f() { return arr[0]; }
        void fillarray() {
                arr[0] = 1234;
        }
};

int main() {
        AbsBase *absBase = new Derived();
        static_cast<Derived*>(absBase)->fillarray();
#pragma omp target parallel for map(to: absBase[0:1])
        for (int i=0;i<10;i++) {
                Derived d1(*static_cast<Derived*>(absBase));
                printf("arr[0] is %d\n", d1.f());
        }
}

On Sun, Jan 12, 2020 at 1:45 PM Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov><mailto:jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>>> wrote:
On 01/12, Itaru Kitayama wrote:
> Do you guys have a timeframe for that feature
> Implemented?

I do not and I don't know anyone who will drive this right now.

As mentioned before, you should be able to "move/copy-create" the
elements on the device in order to use virtual functions.


> On Sun, Jan 12, 2020 at 12:51 Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov><mailto:jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>>> wrote:
>
> > On 01/11, Alexey Bataev via Openmp-dev wrote:
> > > Virtual functions are not supported.
> >
> > Not yet ;).
> >
> > We'll get it with 5.1 so we might actually implement it soon. Till then,
> > you have to create the object on the device you call the virtual
> > function.
> >

--

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov<mailto:jdoerfert at anl.gov><mailto:jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200120/11737499/attachment-0001.html>


More information about the Openmp-dev mailing list