[llvm-dev] Can I control HSA config generated by AMDGPU backend?

UE US via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 6 20:22:52 PDT 2018


This page https://gpuopen.com/opencl-rocm1-6/ also suggests that inline asm
is supported by the rocm toolchain, and there are example exercises /
solutions here:

https://github.com/HandsOnOpenCL/Exercises-Solutions/tree/master/Solutions

The AMD PRO driver says it has supported rocm 1.6 since last year, but it
sounds like that doesn't work with it, so ???

-G


On Thu, Sep 6, 2018 at 10:11 PM UE US <uexplorer666 at gmail.com> wrote:

>
>
> On Wed, Sep 5, 2018 at 1:17 PM Changdao Dong via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> Finally I kind of modified llvm to generate assembly that can run on
>> AMDGPU pro drivers. One problem is the performance of the code generated by
>> llvm is about 10% slower than amdgpu's online compiler. Anything I can tune
>> the performance up the performance of llvm?\
>>
>> Thanks!
>>
>> On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote:
>>
>>> I am writing a miner of crypto currency, for which most users run it
>>> with amdgpu driver. I have written a script the translate the meta data of
>>> LLVM isa format into clrxasm format.
>>>
>>
> clrxasm's docs say it only supports GCN devices to begin with, so it seems
> like you wouldn't actually want to use the --amdhsa "os" flag (or the
> amdgpu target, you'd want amdgcn);  that's for things that will be directly
> loaded with the HSA API as far as I know.  If you felt like it you could
> load and execute them with that API instead of the opencl one and not mess
> around with it further than that.  I've never worked with that, so Artem
> can probably tell you more if that doesn't explain things.  It looks
> relatively straightforward.
> https://gpuopen.com/rocm-with-harmony-combining-opencl-hcc-hsa-in-a-single-program/
>
> This page  https://openwall.info/wiki/john/development/AMD-IL (linked
> from another AMD list posting last year about something similar)   says
> that the following work:
>
> *(i)*Setting the environment variable:
> AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps ./Name_of_executable
> *(ii)*Using the build options:
> In clBuildProram() specify ”-save-temps” in the build option field to
> generate IL and ISA.
>
> ...and the driver will retain the .isa and .il files, but then you'd still
> be left with patching in your changes somehow.   If that works it would at
> least give you an example of what LLVM is currently generating vs. the
> driver so you can compare those and also modify / test assembly changes to
> determine if they're worthwhile for whatever issue you're trying to solve.
>
> If this is an optimization thing, I'd strongly suggest going through the
> files as-is and trying to perform some of the ocl-level optimizations AMD's
> guides suggest.  You'd be surprised what removing a couple of conditionals
> in often-called loops can do for performance of many things.    Looking at
> the code, vectorizing / using native opencl data types would probably show
> some gains as well.  Many of them seem to be straight C source conversions
> of stuff that was optimized for x86 at some point before SSE2 existed and
> promptly  forgotten.
>
> Cheers,
> -G
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180906/023025cf/attachment.html>


More information about the llvm-dev mailing list