[LLVMdev] Supporting heterogeneous computing in llvm.
Samuel Antão
samuelfantao at gmail.com
Tue Jun 9 08:23:49 PDT 2015
Hi Roel,
You'd have to set LIBRARY_PATH to point to where libtarget-nvptx.a lives.
At this moment, we are not translating the -L commands for the target, they
are considered to be meant for the host only. I should probably extend the
documentation to explain this detail.
Thanks,
Samuel
2015-06-09 9:32 GMT-04:00 Roel Jordans <r.jordans at tue.nl>:
> Hi Sergos and Samuel,
>
> Thanks for the links, I've got it mostly working now.
>
> I still have a problem with linking the code. It seems that the clang
> driver doesn't pass its library search path to nvlink when linking the
> generated cuda code to the target library, resulting in it not correctly
> finding libtarget-nvptx.a. Is there some flag or environment variable that
> I should set here? Manually providing nvlink with a -L flag pointing to
> the appropriate path seems to work for the linking step.
>
> Cheers,
> Roel
>
> On 09/06/15 00:07, Samuel Antão wrote:
>
>> Hi Roel, Chris,
>>
>> This is a summary on how you can add support for a a different
>> offloading device on top of what we have in github for OpenMP:
>>
>> a) Download and install lvm (https://github.com/clang-omp/llvm_trunk),
>> and clang (https://github.com/clang-omp/clang_trunk) as usual
>>
>> b) install the official llvm OpenMP runtime library openmp.llvm.org
>> <http://openmp.llvm.org>. Clang will expect that to be present in your
>>
>> library path in order to compile OpenMP code (even if you do not need
>> any OpenMP feature other than offloading).
>>
>> c) Install https://github.com/clang-omp/libomptarget (running ‘make'
>> should do it). This library implements the API to control offloading. It
>> also contains a set of plugins to some targets we are testing this with
>> - x86_64, powerpc64 and NVPTX - in ./RTLs. You will need to implement a
>> plug in for your target as well. The interface used for these plugins is
>> detailed in the document proposed in
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html .You
>> can look at the existing plugins for a hint. In a nutshell you would
>> have to implement code that allocates and moves data to your device,
>> returns a table of entry points and global variables given a device
>> library and launches execution of a given entry point with the provided
>> list of arguments.
>>
>> d) The current implementation is expecting the device library to use ELF
>> format. There is no reason for that other than the platforms we tested
>> this with so far use ELF format. If your device does not use
>> ELF __tgt_register_lib() (src/omptarget.cpp) would have to be extended
>> to understand your desired format. Otherwise you may just update
>> src/targets_info.cpp with your ELF ID and plugin name.
>>
>> e) Offloading is driven by clang, so it has to be aware of the required
>> by yourr device. If your device toolchain is not implemented in clang
>> you would have to do that in lib/Driver/ToolChains.cpp.
>>
>> f) Once everything is in place, you can compile your code by running
>> something like “clang -fopenmp -omptargets=your-target-triple app.c”. If
>> you do separate compilation you could see that two different files are
>> generated for a given source file (the target file has the suffix
>> tgt-your-target-triple).
>>
>> I should say that in general OpenMP requires a runtime library for the
>> device as well, however if you do not use any OpenMP pragmas inside your
>> target code you won’t need that.
>>
>> We started porting our code related with offloading currently in github
>> to clang upstream. The driver support is currently under review in
>> http://reviews.llvm.org/D9888. We are about to send our first offloading
>> codegen patches as well.
>>
>> I understand that what Chris is proposing is somewhat different that
>> what we have in place, given that the transformations are intended to be
>> in LLVM IR. However, the goal seems to be the same. Hope the summary
>> above gives you some hints on whether your use cases can be accommodated.
>>
>> Feel free to ask any questions you may have.
>>
>> Thanks!
>>
>> Samuel
>>
>>
>>
>> 2015-06-08 16:46 GMT-04:00 Sergey Ostanevich <sergos.gnu at gmail.com
>> <mailto:sergos.gnu at gmail.com>>:
>>
>> Roel,
>>
>> You have to checkout and build llvm/clang as usual.
>> For runtime support you'll have to build the libomptarget and make a
>> plugin for your target. Samuel can help you some more.
>> As for the OpenMP examples I can recommend you the
>> http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
>> look into the target constructs.
>>
>> Sergos
>>
>>
>> On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl
>> <mailto:r.jordans at tue.nl>> wrote:
>> > Hi Sergos,
>> >
>> > I'd like to try this on our hardware. Is there some example code
>> that I
>> > could use to get started?
>> >
>> > Cheers,
>> > Roel
>> >
>> >
>> > On 08/06/15 13:27, Sergey Ostanevich wrote:
>> >>
>> >> Chirs,
>> >>
>> >> Have you seen an offloading infrastructure design proposal at
>> >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html
>> ?
>> >> It relies on the long-standing OpenMP standard with recent
>> updates to
>> >> support the heterogenous computations.
>> >> Could you please review it and comment on how it fits to your
>> needs?
>> >>
>> >> It's not quite clear from your proposal what source language
>> standard
>> >> do you plat to support - you just metion that OpenCL will be one
>> of
>> >> your backends, as far as I got it. What's your plan on sources -
>> >> C/C++/FORTRAN?
>> >> How would you control the offloading, data transfer, scheduling
>> and so
>> >> on? Whether it will be new language constructs, similar to
>> prallel_for
>> >> in Cilk Plus, or will it be pragma-based like in OpenMP or
>> OpenACC?
>> >>
>> >> The design I mentioned above has an operable implementation fon
>> NVIDIA
>> >> target at the
>> >>
>> >> https://github.com/clang-omp/llvm_trunk
>> >> https://github.com/clang-omp/clang_trunk
>> >>
>> >> with runtime implemented at
>> >>
>> >> https://github.com/clang-omp/libomptarget
>> >>
>> >> you're welcome to try it out, if you have an appropriate device.
>> >>
>> >> Regards,
>> >> Sergos
>> >>
>> >> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>> >> <chrmargiolas at gmail.com <mailto:chrmargiolas at gmail.com>> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> Thank you a lot for the feedback. I believe that the
>> heterogeneous engine
>> >>> should be strongly connected with parallelization and
>> vectorization
>> >>> efforts.
>> >>> Most of the accelerators are parallel architectures where having
>> >>> efficient
>> >>> parallelization and vectorization can be critical for
>> performance.
>> >>>
>> >>> I am interested in these efforts and I hope that my code can
>> help you
>> >>> managing the offloading operations. Your LLVM instruction set
>> extensions
>> >>> may
>> >>> require some changes in the analysis code but I think is going
>> to be
>> >>> straightforward.
>> >>>
>> >>> I am planning to push my code on phabricator in the next days.
>> >>>
>> >>> thanks,
>> >>> Chris
>> >>>
>> >>>
>> >>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>> >>> <vadve at illinois.edu <mailto:vadve at illinois.edu>>
>>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> Christos,
>> >>>>
>> >>>> We would be very interested in learning more about this.
>> >>>>
>> >>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I)
>> have been
>> >>>> working on LLVM extensions to make it easier to target a wide
>> range of
>> >>>> accelerators in a heterogeneous mobile device, such as
>> Qualcomm's
>> >>>> Snapdragon
>> >>>> and other APUs. Our approach has been to (a) add better
>> abstractions of
>> >>>> parallelism to the LLVM instruction set that can be mapped
>> down to a
>> >>>> wide
>> >>>> range of parallel hardware accelerators; and (b) to develop
>> optimizing
>> >>>> "back-end" translators to generate efficient code for the
>> accelerators
>> >>>> from
>> >>>> the extended IR.
>> >>>>
>> >>>> So far, we have been targeting GPUs and vector hardware, but
>> semi-custom
>> >>>> (programmable) accelerators are our next goal. We have
>> discussed DSPs
>> >>>> as a
>> >>>> valuable potential goal as well.
>> >>>>
>> >>>> Judging from the brief information here, I'm guessing that our
>> projects
>> >>>> have been quite complementary. We have not worked on the
>> extraction
>> >>>> passes,
>> >>>> scheduling, or other run-time components you mention and would
>> be happy
>> >>>> to
>> >>>> use an existing solution for those. Our hope is that the IR
>> extensions
>> >>>> and
>> >>>> translators will give your schedulers greater flexibility to
>> retarget
>> >>>> the
>> >>>> extracted code components to different accelerators.
>> >>>>
>> >>>> --Vikram S. Adve
>> >>>> Visiting Professor, School of Computer and Communication
>> Sciences, EPFL
>> >>>> Professor, Department of Computer Science
>> >>>> University of Illinois at Urbana-Champaign
>> >>>> vadve at illinois.edu <mailto:vadve at illinois.edu>
>> >>>> http://llvm.org
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu
>> <mailto:llvmdev-request at cs.uiuc.edu> wrote:
>> >>>>
>> >>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>> >>>>> From: Christos Margiolas <chrmargiolas at gmail.com
>> <mailto:chrmargiolas at gmail.com>>
>> >>>>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu
>> <mailto:llvmdev at cs.uiuc.edu>>
>> >>>>> Subject: [LLVMdev] Supporting heterogeneous computing in llvm.
>> >>>>> Message-ID:
>> >>>>>
>> >>>>>
>> <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com
>> <mailto:
>> CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com>>
>>
>> >>>>> Content-Type: text/plain; charset="utf-8"
>> >>>>>
>> >>>>> Hello All,
>> >>>>>
>> >>>>> The last two months I have been working on the design and
>> >>>>> implementation
>> >>>>> of
>> >>>>> a heterogeneous execution engine for LLVM. I started this
>> project as an
>> >>>>> intern at the Qualcomm Innovation Center and I believe it can
>> be useful
>> >>>>> to
>> >>>>> different people and use cases. I am planning to share more
>> details and
>> >>>>> a
>> >>>>> set of patches in the next
>> >>>>> days. However, I would first like to see if there is an
>> interest for
>> >>>>> this.
>> >>>>>
>> >>>>> The project is about providing compiler and runtime support
>> for the
>> >>>>> automatic and transparent offloading of loop or function
>> workloads to
>> >>>>> accelerators.
>> >>>>>
>> >>>>> It is composed of the following:
>> >>>>> a) Compiler and Transformation Passes for extracting loops or
>> functions
>> >>>>> for
>> >>>>> offloading.
>> >>>>> b) A runtime library that handles scheduling, data sharing and
>> >>>>> coherency
>> >>>>> between the
>> >>>>> host and accelerator sides.
>> >>>>> c) A modular codebase and design. Adaptors specialize the code
>> >>>>> transformations for the target accelerators. Runtime plugins
>> manage the
>> >>>>> interaction with the different accelerator environments.
>> >>>>>
>> >>>>> So far, this work so far supports the Qualcomm DSP
>> accelerator but I
>> >>>>> am
>> >>>>> planning to extend it to support OpenCL accelerators. I have
>> also
>> >>>>> developed
>> >>>>> a debug port where I can test the passes and the runtime
>> without
>> >>>>> requiring
>> >>>>> an accelerator.
>> >>>>>
>> >>>>>
>> >>>>> The project is still in early R&D stage and I am looking
>> forward for
>> >>>>> feedback and to gauge the interest level. I am willing to
>> continue
>> >>>>> working
>> >>>>> on this as an open source project and bring it to the right
>> shape so it
>> >>>>> can
>> >>>>> be merged with the LLVM tree.
>> >>>>>
>> >>>>>
>> >>>>> Regards,
>> >>>>> Chris
>> >>>>>
>> >>>>> P.S. I intent to join the llvm social in Bay Area tonight and
>> I will be
>> >>>>> more than happy to talk about it.
>> >>>>> -------------- next part --------------
>> >>>>> An HTML attachment was scrubbed...
>> >>>>> URL:
>> >>>>>
>> >>>>>
>> <
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html
>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150609/a470182c/attachment.html>
More information about the llvm-dev
mailing list