[LLVMdev] Supporting heterogeneous computing in llvm.

Tue Jun 9 08:52:36 PDT 2015

Hi Samuel,

Thanks, that helps.  I bumped into some other things after that.

1. The build-script for the target RTL was apparently compiled for sm_35 
capabilities so I needed to adjust the -omptargets triple to 
nvptx64sm_35-nvidia-cuda (as opposed to the nvptx64-nvidia-cuda that was 
used in the design document)

2. I had to provide -lelf on the clang command line to avoid getting 
several undefined references from uses of libelf in libomptarget.

After that the code compiles but I think that it fails to properly 
detect the target device, omp_get_num_devices() returns 0.

Cheers,
  Roel

On 09/06/15 17:23, Samuel Antão wrote:
> Hi Roel,
>
> You'd have to set LIBRARY_PATH to point to where libtarget-nvptx.a
> lives. At this moment, we are not translating the -L commands for the
> target, they are considered to be meant for the host only. I should
> probably extend the documentation to explain this detail.
>
> Thanks,
> Samuel
>
> 2015-06-09 9:32 GMT-04:00 Roel Jordans <r.jordans at tue.nl
> <mailto:r.jordans at tue.nl>>:
>
>     Hi Sergos and Samuel,
>
>     Thanks for the links, I've got it mostly working now.
>
>     I still have a problem with linking the code.  It seems that the
>     clang driver doesn't pass its library search path to nvlink when
>     linking the generated cuda code to the target library, resulting in
>     it not correctly finding libtarget-nvptx.a.  Is there some flag or
>     environment variable that I should set here?  Manually providing
>     nvlink with a -L flag pointing to the appropriate path seems to work
>     for the linking step.
>
>     Cheers,
>       Roel
>
>     On 09/06/15 00:07, Samuel Antão wrote:
>
>         Hi Roel, Chris,
>
>         This is a summary on how you can add support for a a different
>         offloading device on top of what we have in github for OpenMP:
>
>         a) Download and install lvm
>         (https://github.com/clang-omp/llvm_trunk),
>         and clang (https://github.com/clang-omp/clang_trunk) as usual
>
>         b) install the official llvm OpenMP runtime library
>         openmp.llvm.org <http://openmp.llvm.org>
>         <http://openmp.llvm.org>. Clang will expect that to be present
>         in your
>
>         library path in order to compile OpenMP code (even if you do not
>         need
>         any OpenMP feature other than offloading).
>
>         c) Install https://github.com/clang-omp/libomptarget (running ‘make'
>         should do it). This library implements the API to control
>         offloading. It
>         also contains a set of plugins to some targets we are testing
>         this with
>         - x86_64, powerpc64 and NVPTX - in ./RTLs. You will need to
>         implement a
>         plug in for your target as well. The interface used for these
>         plugins is
>         detailed in the document proposed in
>         http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html .You
>         can look at the existing plugins for a hint. In a nutshell you would
>         have to implement code that allocates and moves data to your device,
>         returns a table of entry points and global variables given a device
>         library and launches execution of a given entry point with the
>         provided
>         list of arguments.
>
>         d) The current implementation is expecting the device library to
>         use ELF
>         format. There is no reason for that other than the platforms we
>         tested
>         this with so far use ELF format. If your device does not use
>         ELF __tgt_register_lib() (src/omptarget.cpp) would have to be
>         extended
>         to understand your desired format. Otherwise you may just update
>         src/targets_info.cpp with your ELF ID and plugin name.
>
>         e) Offloading is driven by clang, so it has to be aware of the
>         required
>         by yourr device. If your device toolchain is not implemented in
>         clang
>         you would have to do that in lib/Driver/ToolChains.cpp.
>
>         f) Once everything is in place, you can compile your code by running
>         something like “clang -fopenmp -omptargets=your-target-triple
>         app.c”. If
>         you do separate compilation you could see that two different
>         files are
>         generated for a given source file (the target file has the suffix
>         tgt-your-target-triple).
>
>         I should say that in general OpenMP requires a runtime library
>         for the
>         device as well, however if you do not use any OpenMP pragmas
>         inside your
>         target code you won’t need that.
>
>         We started porting our code related with offloading currently in
>         github
>         to clang upstream. The driver support is currently under review in
>         http://reviews.llvm.org/D9888. We are about to send our first
>         offloading
>         codegen patches as well.
>
>         I understand that what Chris is proposing is somewhat different that
>         what we have in place, given that the transformations are
>         intended to be
>         in LLVM IR. However, the goal seems to be the same. Hope the summary
>         above gives you some hints on whether your use cases can be
>         accommodated.
>
>         Feel free to ask any questions you may have.
>
>         Thanks!
>
>         Samuel
>
>
>
>         2015-06-08 16:46 GMT-04:00 Sergey Ostanevich
>         <sergos.gnu at gmail.com <mailto:sergos.gnu at gmail.com>
>         <mailto:sergos.gnu at gmail.com <mailto:sergos.gnu at gmail.com>>>:
>
>              Roel,
>
>              You have to checkout and build llvm/clang as usual.
>              For runtime support you'll have to build the libomptarget
>         and make a
>              plugin for your target. Samuel can help you some more.
>              As for the OpenMP examples I can recommend you the
>         http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
>              look into the target constructs.
>
>              Sergos
>
>
>              On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans
>         <r.jordans at tue.nl <mailto:r.jordans at tue.nl>
>              <mailto:r.jordans at tue.nl <mailto:r.jordans at tue.nl>>> wrote:
>               > Hi Sergos,
>               >
>               > I'd like to try this on our hardware.  Is there some
>         example code
>              that I
>               > could use to get started?
>               >
>               > Cheers,
>               >  Roel
>               >
>               >
>               > On 08/06/15 13:27, Sergey Ostanevich wrote:
>               >>
>               >> Chirs,
>               >>
>               >> Have you seen an offloading infrastructure design
>         proposal at
>               >>
>         http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
>               >> It relies on the long-standing OpenMP standard with recent
>              updates to
>               >> support the heterogenous computations.
>               >> Could you please review it and comment on how it fits
>         to your needs?
>               >>
>               >> It's not quite clear from your proposal what source
>         language
>              standard
>               >> do you plat to support - you just metion that OpenCL
>         will be one of
>               >> your backends, as far as I got it. What's your plan on
>         sources -
>               >> C/C++/FORTRAN?
>               >> How would you control the offloading, data transfer,
>         scheduling
>              and so
>               >> on? Whether it will be new language constructs, similar to
>              prallel_for
>               >> in Cilk Plus, or will it be pragma-based like in OpenMP
>         or OpenACC?
>               >>
>               >> The design I mentioned above has an operable
>         implementation fon
>              NVIDIA
>               >> target at the
>               >>
>               >> https://github.com/clang-omp/llvm_trunk
>               >> https://github.com/clang-omp/clang_trunk
>               >>
>               >> with runtime implemented at
>               >>
>               >> https://github.com/clang-omp/libomptarget
>               >>
>               >> you're welcome to try it out, if you have an
>         appropriate device.
>               >>
>               >> Regards,
>               >> Sergos
>               >>
>               >> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>               >> <chrmargiolas at gmail.com <mailto:chrmargiolas at gmail.com>
>         <mailto:chrmargiolas at gmail.com <mailto:chrmargiolas at gmail.com>>>
>         wrote:
>               >>>
>               >>> Hello,
>               >>>
>               >>> Thank you a lot for the feedback. I believe that the
>              heterogeneous engine
>               >>> should be strongly connected with parallelization and
>         vectorization
>               >>> efforts.
>               >>> Most of the accelerators are parallel architectures
>         where having
>               >>> efficient
>               >>> parallelization and vectorization can be critical for
>         performance.
>               >>>
>               >>> I am interested in these efforts and I hope that my
>         code can
>              help you
>               >>> managing the offloading operations. Your LLVM
>         instruction set
>              extensions
>               >>> may
>               >>> require some changes in the analysis code but I think
>         is going
>              to be
>               >>> straightforward.
>               >>>
>               >>> I am planning to push my code on phabricator in the
>         next days.
>               >>>
>               >>> thanks,
>               >>> Chris
>               >>>
>               >>>
>               >>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>               >>> <vadve at illinois.edu <mailto:vadve at illinois.edu>
>         <mailto:vadve at illinois.edu <mailto:vadve at illinois.edu>>>
>
>               >>> wrote:
>               >>>>
>               >>>>
>               >>>> Christos,
>               >>>>
>               >>>> We would be very interested in learning more about this.
>               >>>>
>               >>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou
>         and I)
>              have been
>               >>>> working on LLVM extensions to make it easier to
>         target a wide
>              range of
>               >>>> accelerators in a heterogeneous mobile device, such
>         as Qualcomm's
>               >>>> Snapdragon
>               >>>> and other APUs.  Our approach has been to (a) add better
>              abstractions of
>               >>>> parallelism to the LLVM instruction set that can be
>         mapped
>              down to a
>               >>>> wide
>               >>>> range of parallel hardware accelerators; and (b) to
>         develop
>              optimizing
>               >>>> "back-end" translators to generate efficient code for the
>              accelerators
>               >>>> from
>               >>>> the extended IR.
>               >>>>
>               >>>> So far, we have been targeting GPUs and vector
>         hardware, but
>              semi-custom
>               >>>> (programmable) accelerators are our next goal.  We have
>              discussed DSPs
>               >>>> as a
>               >>>> valuable potential goal as well.
>               >>>>
>               >>>> Judging from the brief information here, I'm guessing
>         that our
>              projects
>               >>>> have been quite complementary.  We have not worked on the
>              extraction
>               >>>> passes,
>               >>>> scheduling, or other run-time components you mention
>         and would
>              be happy
>               >>>> to
>               >>>> use an existing solution for those.  Our hope is that
>         the IR
>              extensions
>               >>>> and
>               >>>> translators will give your schedulers greater
>         flexibility to
>              retarget
>               >>>> the
>               >>>> extracted code components to different accelerators.
>               >>>>
>               >>>> --Vikram S. Adve
>               >>>> Visiting Professor, School of Computer and Communication
>              Sciences, EPFL
>               >>>> Professor, Department of Computer Science
>               >>>> University of Illinois at Urbana-Champaign
>               >>>> vadve at illinois.edu <mailto:vadve at illinois.edu>
>         <mailto:vadve at illinois.edu <mailto:vadve at illinois.edu>>
>               >>>> http://llvm.org
>               >>>>
>               >>>>
>               >>>>
>               >>>>
>               >>>> On Jun 5, 2015, at 3:18 AM,
>         llvmdev-request at cs.uiuc.edu <mailto:llvmdev-request at cs.uiuc.edu>
>              <mailto:llvmdev-request at cs.uiuc.edu
>         <mailto:llvmdev-request at cs.uiuc.edu>> wrote:
>               >>>>
>               >>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>               >>>>> From: Christos Margiolas <chrmargiolas at gmail.com
>         <mailto:chrmargiolas at gmail.com>
>              <mailto:chrmargiolas at gmail.com
>         <mailto:chrmargiolas at gmail.com>>>
>               >>>>> To: LLVM Developers Mailing List
>         <llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>
>              <mailto:llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>>>
>               >>>>> Subject: [LLVMdev] Supporting heterogeneous
>         computing in llvm.
>               >>>>> Message-ID:
>               >>>>>
>               >>>>>
>
>         <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com
>         <mailto:CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com>
>
>         <mailto:CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com
>         <mailto:CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com>>>
>
>               >>>>> Content-Type: text/plain; charset="utf-8"
>               >>>>>
>               >>>>> Hello All,
>               >>>>>
>               >>>>> The last two months I have been working on the
>         design and
>               >>>>> implementation
>               >>>>> of
>               >>>>> a heterogeneous execution engine for LLVM. I started
>         this
>              project as an
>               >>>>> intern at the Qualcomm Innovation Center and I
>         believe it can
>              be useful
>               >>>>> to
>               >>>>> different people and use cases. I am planning to
>         share more
>              details and
>               >>>>> a
>               >>>>> set of patches in the next
>               >>>>> days. However, I would first like to see if there is an
>              interest for
>               >>>>> this.
>               >>>>>
>               >>>>> The project is about providing compiler and runtime
>         support
>              for the
>               >>>>> automatic and transparent offloading of loop or function
>              workloads to
>               >>>>> accelerators.
>               >>>>>
>               >>>>> It is composed of the following:
>               >>>>> a) Compiler and Transformation Passes for extracting
>         loops or
>              functions
>               >>>>> for
>               >>>>> offloading.
>               >>>>> b) A runtime library that handles scheduling, data
>         sharing and
>               >>>>> coherency
>               >>>>> between the
>               >>>>> host and accelerator sides.
>               >>>>> c) A modular codebase and design. Adaptors
>         specialize the code
>               >>>>> transformations for the target accelerators. Runtime
>         plugins
>              manage the
>               >>>>> interaction with the different accelerator environments.
>               >>>>>
>               >>>>> So far, this work so far supports the Qualcomm DSP
>              accelerator  but I
>               >>>>> am
>               >>>>> planning to extend it to support OpenCL
>         accelerators. I have also
>               >>>>> developed
>               >>>>> a debug port where I can test the passes and the
>         runtime without
>               >>>>> requiring
>               >>>>> an accelerator.
>               >>>>>
>               >>>>>
>               >>>>> The project is still in early R&D stage and I am looking
>              forward for
>               >>>>> feedback and to gauge  the interest level. I am
>         willing to
>              continue
>               >>>>> working
>               >>>>> on this as an open source project and bring it to
>         the right
>              shape so it
>               >>>>> can
>               >>>>> be merged with the LLVM tree.
>               >>>>>
>               >>>>>
>               >>>>> Regards,
>               >>>>> Chris
>               >>>>>
>               >>>>> P.S. I intent to join the llvm social in Bay Area
>         tonight and
>              I will be
>               >>>>> more than happy to talk about it.
>               >>>>> -------------- next part --------------
>               >>>>> An HTML attachment was scrubbed...
>               >>>>> URL:
>               >>>>>
>               >>>>>
>
>         <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>               >>>>
>               >>>>
>               >>>>
>               >>>> _______________________________________________
>               >>>> LLVM Developers mailing list
>               >>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>               >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>               >>>
>               >>>
>               >>>
>               >>>
>               >>> _______________________________________________
>               >>> LLVM Developers mailing list
>               >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>               >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>               >>>
>               >> _______________________________________________
>               >> LLVM Developers mailing list
>               >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>               >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>               >>
>               > _______________________________________________
>               > LLVM Developers mailing list
>               > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>               > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>