[llvm-dev] Can I control HSA config generated by AMDGPU backend?

Sun Sep 9 00:42:47 PDT 2018

Finally I got something working. The speed went up after I disable loop
unrolling and replace "get_local_size(1)" with constant. So llvm indeed is
a very good compiler, comparable to AMD's own.

On Thu, Sep 6, 2018 at 8:23 PM UE US <uexplorer666 at gmail.com> wrote:

> This page https://gpuopen.com/opencl-rocm1-6/ also suggests that inline
> asm is supported by the rocm toolchain, and there are example exercises /
> solutions here:
>
> https://github.com/HandsOnOpenCL/Exercises-Solutions/tree/master/Solutions
>
> The AMD PRO driver says it has supported rocm 1.6 since last year, but it
> sounds like that doesn't work with it, so ???
>
> -G
>
>
> On Thu, Sep 6, 2018 at 10:11 PM UE US <uexplorer666 at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 5, 2018 at 1:17 PM Changdao Dong via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> Finally I kind of modified llvm to generate assembly that can run on
>>> AMDGPU pro drivers. One problem is the performance of the code generated by
>>> llvm is about 10% slower than amdgpu's online compiler. Anything I can tune
>>> the performance up the performance of llvm?\
>>>
>>> Thanks!
>>>
>>> On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote:
>>>
>>>> I am writing a miner of crypto currency, for which most users run it
>>>> with amdgpu driver. I have written a script the translate the meta data of
>>>> LLVM isa format into clrxasm format.
>>>>
>>>
>> clrxasm's docs say it only supports GCN devices to begin with, so it
>> seems like you wouldn't actually want to use the --amdhsa "os" flag (or the
>> amdgpu target, you'd want amdgcn);  that's for things that will be directly
>> loaded with the HSA API as far as I know.  If you felt like it you could
>> load and execute them with that API instead of the opencl one and not mess
>> around with it further than that.  I've never worked with that, so Artem
>> can probably tell you more if that doesn't explain things.  It looks
>> relatively straightforward.
>> https://gpuopen.com/rocm-with-harmony-combining-opencl-hcc-hsa-in-a-single-program/
>>
>> This page  https://openwall.info/wiki/john/development/AMD-IL (linked
>> from another AMD list posting last year about something similar)   says
>> that the following work:
>>
>> *(i)*Setting the environment variable:
>> AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps ./Name_of_executable
>> *(ii)*Using the build options:
>> In clBuildProram() specify ”-save-temps” in the build option field to
>> generate IL and ISA.
>>
>> ...and the driver will retain the .isa and .il files, but then you'd
>> still be left with patching in your changes somehow.   If that works it
>> would at least give you an example of what LLVM is currently generating vs.
>> the driver so you can compare those and also modify / test assembly changes
>> to determine if they're worthwhile for whatever issue you're trying to
>> solve.
>>
>> If this is an optimization thing, I'd strongly suggest going through the
>> files as-is and trying to perform some of the ocl-level optimizations AMD's
>> guides suggest.  You'd be surprised what removing a couple of conditionals
>> in often-called loops can do for performance of many things.    Looking at
>> the code, vectorizing / using native opencl data types would probably show
>> some gains as well.  Many of them seem to be straight C source conversions
>> of stuff that was optimized for x86 at some point before SSE2 existed and
>> promptly  forgotten.
>>
>> Cheers,
>> -G
>>
>

-- 
DONG, Changdao

MP: 1-412-551-2330
dongchangdao at gmail.com <cddong at cmu.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180909/2b1a904c/attachment.html>