[LLVMdev] Supporting heterogeneous computing in llvm.

Tue Jun 9 08:23:49 PDT 2015

Hi Roel,

You'd have to set LIBRARY_PATH to point to where libtarget-nvptx.a lives.
At this moment, we are not translating the -L commands for the target, they
are considered to be meant for the host only. I should probably extend the
documentation to explain this detail.

Thanks,
Samuel

2015-06-09 9:32 GMT-04:00 Roel Jordans <r.jordans at tue.nl>:

> Hi Sergos and Samuel,
>
> Thanks for the links, I've got it mostly working now.
>
> I still have a problem with linking the code.  It seems that the clang
> driver doesn't pass its library search path to nvlink when linking the
> generated cuda code to the target library, resulting in it not correctly
> finding libtarget-nvptx.a.  Is there some flag or environment variable that
> I should set here?  Manually providing nvlink with a -L flag pointing to
> the appropriate path seems to work for the linking step.
>
> Cheers,
>  Roel
>
> On 09/06/15 00:07, Samuel Antão wrote:
>
>> Hi Roel, Chris,
>>
>> This is a summary on how you can add support for a a different
>> offloading device on top of what we have in github for OpenMP:
>>
>> a) Download and install lvm (https://github.com/clang-omp/llvm_trunk),
>> and clang (https://github.com/clang-omp/clang_trunk) as usual
>>
>> b) install the official llvm OpenMP runtime library openmp.llvm.org
>> <http://openmp.llvm.org>. Clang will expect that to be present in your
>>
>> library path in order to compile OpenMP code (even if you do not need
>> any OpenMP feature other than offloading).
>>
>> c) Install https://github.com/clang-omp/libomptarget (running ‘make'
>> should do it). This library implements the API to control offloading. It
>> also contains a set of plugins to some targets we are testing this with
>> - x86_64, powerpc64 and NVPTX - in ./RTLs. You will need to implement a
>> plug in for your target as well. The interface used for these plugins is
>> detailed in the document proposed in
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html .You
>> can look at the existing plugins for a hint. In a nutshell you would
>> have to implement code that allocates and moves data to your device,
>> returns a table of entry points and global variables given a device
>> library and launches execution of a given entry point with the provided
>> list of arguments.
>>
>> d) The current implementation is expecting the device library to use ELF
>> format. There is no reason for that other than the platforms we tested
>> this with so far use ELF format. If your device does not use
>> ELF __tgt_register_lib() (src/omptarget.cpp) would have to be extended
>> to understand your desired format. Otherwise you may just update
>> src/targets_info.cpp with your ELF ID and plugin name.
>>
>> e) Offloading is driven by clang, so it has to be aware of the required
>> by yourr device. If your device toolchain is not implemented in clang
>> you would have to do that in lib/Driver/ToolChains.cpp.
>>
>> f) Once everything is in place, you can compile your code by running
>> something like “clang -fopenmp -omptargets=your-target-triple app.c”. If
>> you do separate compilation you could see that two different files are
>> generated for a given source file (the target file has the suffix
>> tgt-your-target-triple).
>>
>> I should say that in general OpenMP requires a runtime library for the
>> device as well, however if you do not use any OpenMP pragmas inside your
>> target code you won’t need that.
>>
>> We started porting our code related with offloading currently in github
>> to clang upstream. The driver support is currently under review in
>> http://reviews.llvm.org/D9888. We are about to send our first offloading
>> codegen patches as well.
>>
>> I understand that what Chris is proposing is somewhat different that
>> what we have in place, given that the transformations are intended to be
>> in LLVM IR. However, the goal seems to be the same. Hope the summary
>> above gives you some hints on whether your use cases can be accommodated.
>>
>> Feel free to ask any questions you may have.
>>
>> Thanks!
>>
>> Samuel
>>
>>
>>
>> 2015-06-08 16:46 GMT-04:00 Sergey Ostanevich <sergos.gnu at gmail.com
>> <mailto:sergos.gnu at gmail.com>>:
>>
>>     Roel,
>>
>>     You have to checkout and build llvm/clang as usual.
>>     For runtime support you'll have to build the libomptarget and make a
>>     plugin for your target. Samuel can help you some more.
>>     As for the OpenMP examples I can recommend you the
>>     http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
>>     look into the target constructs.
>>
>>     Sergos
>>
>>
>>     On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl
>>     <mailto:r.jordans at tue.nl>> wrote:
>>      > Hi Sergos,
>>      >
>>      > I'd like to try this on our hardware.  Is there some example code
>>     that I
>>      > could use to get started?
>>      >
>>      > Cheers,
>>      >  Roel
>>      >
>>      >
>>      > On 08/06/15 13:27, Sergey Ostanevich wrote:
>>      >>
>>      >> Chirs,
>>      >>
>>      >> Have you seen an offloading infrastructure design proposal at
>>      >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html
>> ?
>>      >> It relies on the long-standing OpenMP standard with recent
>>     updates to
>>      >> support the heterogenous computations.
>>      >> Could you please review it and comment on how it fits to your
>> needs?
>>      >>
>>      >> It's not quite clear from your proposal what source language
>>     standard
>>      >> do you plat to support - you just metion that OpenCL will be one
>> of
>>      >> your backends, as far as I got it. What's your plan on sources -
>>      >> C/C++/FORTRAN?
>>      >> How would you control the offloading, data transfer, scheduling
>>     and so
>>      >> on? Whether it will be new language constructs, similar to
>>     prallel_for
>>      >> in Cilk Plus, or will it be pragma-based like in OpenMP or
>> OpenACC?
>>      >>
>>      >> The design I mentioned above has an operable implementation fon
>>     NVIDIA
>>      >> target at the
>>      >>
>>      >> https://github.com/clang-omp/llvm_trunk
>>      >> https://github.com/clang-omp/clang_trunk
>>      >>
>>      >> with runtime implemented at
>>      >>
>>      >> https://github.com/clang-omp/libomptarget
>>      >>
>>      >> you're welcome to try it out, if you have an appropriate device.
>>      >>
>>      >> Regards,
>>      >> Sergos
>>      >>
>>      >> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>>      >> <chrmargiolas at gmail.com <mailto:chrmargiolas at gmail.com>> wrote:
>>      >>>
>>      >>> Hello,
>>      >>>
>>      >>> Thank you a lot for the feedback. I believe that the
>>     heterogeneous engine
>>      >>> should be strongly connected with parallelization and
>> vectorization
>>      >>> efforts.
>>      >>> Most of the accelerators are parallel architectures where having
>>      >>> efficient
>>      >>> parallelization and vectorization can be critical for
>> performance.
>>      >>>
>>      >>> I am interested in these efforts and I hope that my code can
>>     help you
>>      >>> managing the offloading operations. Your LLVM instruction set
>>     extensions
>>      >>> may
>>      >>> require some changes in the analysis code but I think is going
>>     to be
>>      >>> straightforward.
>>      >>>
>>      >>> I am planning to push my code on phabricator in the next days.
>>      >>>
>>      >>> thanks,
>>      >>> Chris
>>      >>>
>>      >>>
>>      >>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>>      >>> <vadve at illinois.edu <mailto:vadve at illinois.edu>>
>>
>>      >>> wrote:
>>      >>>>
>>      >>>>
>>      >>>> Christos,
>>      >>>>
>>      >>>> We would be very interested in learning more about this.
>>      >>>>
>>      >>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I)
>>     have been
>>      >>>> working on LLVM extensions to make it easier to target a wide
>>     range of
>>      >>>> accelerators in a heterogeneous mobile device, such as
>> Qualcomm's
>>      >>>> Snapdragon
>>      >>>> and other APUs.  Our approach has been to (a) add better
>>     abstractions of
>>      >>>> parallelism to the LLVM instruction set that can be mapped
>>     down to a
>>      >>>> wide
>>      >>>> range of parallel hardware accelerators; and (b) to develop
>>     optimizing
>>      >>>> "back-end" translators to generate efficient code for the
>>     accelerators
>>      >>>> from
>>      >>>> the extended IR.
>>      >>>>
>>      >>>> So far, we have been targeting GPUs and vector hardware, but
>>     semi-custom
>>      >>>> (programmable) accelerators are our next goal.  We have
>>     discussed DSPs
>>      >>>> as a
>>      >>>> valuable potential goal as well.
>>      >>>>
>>      >>>> Judging from the brief information here, I'm guessing that our
>>     projects
>>      >>>> have been quite complementary.  We have not worked on the
>>     extraction
>>      >>>> passes,
>>      >>>> scheduling, or other run-time components you mention and would
>>     be happy
>>      >>>> to
>>      >>>> use an existing solution for those.  Our hope is that the IR
>>     extensions
>>      >>>> and
>>      >>>> translators will give your schedulers greater flexibility to
>>     retarget
>>      >>>> the
>>      >>>> extracted code components to different accelerators.
>>      >>>>
>>      >>>> --Vikram S. Adve
>>      >>>> Visiting Professor, School of Computer and Communication
>>     Sciences, EPFL
>>      >>>> Professor, Department of Computer Science
>>      >>>> University of Illinois at Urbana-Champaign
>>      >>>> vadve at illinois.edu <mailto:vadve at illinois.edu>
>>      >>>> http://llvm.org
>>      >>>>
>>      >>>>
>>      >>>>
>>      >>>>
>>      >>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu
>>     <mailto:llvmdev-request at cs.uiuc.edu> wrote:
>>      >>>>
>>      >>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>>      >>>>> From: Christos Margiolas <chrmargiolas at gmail.com
>>     <mailto:chrmargiolas at gmail.com>>
>>      >>>>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu
>>     <mailto:llvmdev at cs.uiuc.edu>>
>>      >>>>> Subject: [LLVMdev] Supporting heterogeneous computing in llvm.
>>      >>>>> Message-ID:
>>      >>>>>
>>      >>>>>
>>     <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com
>>     <mailto:
>> CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com>>
>>
>>      >>>>> Content-Type: text/plain; charset="utf-8"
>>      >>>>>
>>      >>>>> Hello All,
>>      >>>>>
>>      >>>>> The last two months I have been working on the design and
>>      >>>>> implementation
>>      >>>>> of
>>      >>>>> a heterogeneous execution engine for LLVM. I started this
>>     project as an
>>      >>>>> intern at the Qualcomm Innovation Center and I believe it can
>>     be useful
>>      >>>>> to
>>      >>>>> different people and use cases. I am planning to share more
>>     details and
>>      >>>>> a
>>      >>>>> set of patches in the next
>>      >>>>> days. However, I would first like to see if there is an
>>     interest for
>>      >>>>> this.
>>      >>>>>
>>      >>>>> The project is about providing compiler and runtime support
>>     for the
>>      >>>>> automatic and transparent offloading of loop or function
>>     workloads to
>>      >>>>> accelerators.
>>      >>>>>
>>      >>>>> It is composed of the following:
>>      >>>>> a) Compiler and Transformation Passes for extracting loops or
>>     functions
>>      >>>>> for
>>      >>>>> offloading.
>>      >>>>> b) A runtime library that handles scheduling, data sharing and
>>      >>>>> coherency
>>      >>>>> between the
>>      >>>>> host and accelerator sides.
>>      >>>>> c) A modular codebase and design. Adaptors specialize the code
>>      >>>>> transformations for the target accelerators. Runtime plugins
>>     manage the
>>      >>>>> interaction with the different accelerator environments.
>>      >>>>>
>>      >>>>> So far, this work so far supports the Qualcomm DSP
>>     accelerator  but I
>>      >>>>> am
>>      >>>>> planning to extend it to support OpenCL accelerators. I have
>> also
>>      >>>>> developed
>>      >>>>> a debug port where I can test the passes and the runtime
>> without
>>      >>>>> requiring
>>      >>>>> an accelerator.
>>      >>>>>
>>      >>>>>
>>      >>>>> The project is still in early R&D stage and I am looking
>>     forward for
>>      >>>>> feedback and to gauge  the interest level. I am willing to
>>     continue
>>      >>>>> working
>>      >>>>> on this as an open source project and bring it to the right
>>     shape so it
>>      >>>>> can
>>      >>>>> be merged with the LLVM tree.
>>      >>>>>
>>      >>>>>
>>      >>>>> Regards,
>>      >>>>> Chris
>>      >>>>>
>>      >>>>> P.S. I intent to join the llvm social in Bay Area tonight and
>>     I will be
>>      >>>>> more than happy to talk about it.
>>      >>>>> -------------- next part --------------
>>      >>>>> An HTML attachment was scrubbed...
>>      >>>>> URL:
>>      >>>>>
>>      >>>>>
>>     <
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html
>> >
>>      >>>>
>>      >>>>
>>      >>>>
>>      >>>> _______________________________________________
>>      >>>> LLVM Developers mailing list
>>      >>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>      >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>      >>>
>>      >>>
>>      >>>
>>      >>>
>>      >>> _______________________________________________
>>      >>> LLVM Developers mailing list
>>      >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>      >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>      >>>
>>      >> _______________________________________________
>>      >> LLVM Developers mailing list
>>      >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>      >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>      >>
>>      > _______________________________________________
>>      > LLVM Developers mailing list
>>      > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>      > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150609/a470182c/attachment.html>