<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi,<br>

      <br>

      First of all, i'm not sure if i should be posting this here or in

      cfe-dev, but here it goes.<br>

      <br>

      In order to instrument CUDA kernels i first generate device IR

      with:<br>

      <br>

      <font face="Courier New, Courier, monospace">clang++ -x cuda

        --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc<br>

      </font><br>

      I also have a library that contains the instrumentation stubs for

      which i generate IR similarly and i link it with the device IR

      programmatically with <font face="Courier New, Courier,

        monospace">Linker::linkModules(..)</font><br>

      <br>

      Then after some analysis i use llc to get ptx:<br>

      <br>

      <font face="Courier New, Courier, monospace">llc device.bc

        --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx</font><br>

      <br>

      This works fine but the problem is that the instrumentation code

      uses <font face="Courier New, Courier, monospace">__shfl_sync()</font>

      and ptxas gives me the following error:<br>

      <br>

      <font face="Courier New, Courier, monospace">ptxas device.ptx,

        line 1033; error   : Feature 'shfl.sync' requires PTX ISA

        .version 6.0 or later</font><br>

      <br>

      Now according to

<a class="moz-txt-link-freetext" href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions</a>,<br>

      <br>

      __shfl_sync is supported by compute capability >= 3 and

      according to <a class="moz-txt-link-freetext" href="https://developer.nvidia.com/cuda-gpus#compute">https://developer.nvidia.com/cuda-gpus#compute</a> my

      GTX950 has Compute Capability 5.2. <br>

      <br>

      Also according to

<a class="moz-txt-link-freetext" href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes">https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes</a>

      PTX ISA 6.0 does support sm_52.<br>

      <br>

      However llc generates:<br>

      <br>

      .version 4.1<br>

      .target sm_52, debug<br>

      .address_size 64<br>

      <br>

      Any ideas why this is happening? Or am i doing something wrong?<br>

      <br>

      PS. I'm using CUDA 10, driver 440<br>

      <br>

      ~George<br>

    </p>

  </body>

</html>