[PATCH] D141054: [NVPTX] Set default version of architecture to SM_30, PTX to 6.0.

Fri Jan 6 14:39:06 PST 2023

tra added inline comments.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:2
 # RUN: %python %s --target=cuda --tests=suld,sust,tex,tld4 --gen-list=%t.list > %t-cuda.ll
-# RUN: llc %t-cuda.ll -verify-machineinstrs -o - | FileCheck %t-cuda.ll
-# RUN: %if ptxas %{ llc %t-cuda.ll -verify-machineinstrs -o - | %ptxas-verify %}
+# RUN: llc -mcpu=sm_20 %t-cuda.ll -verify-machineinstrs -o - | FileCheck %t-cuda.ll
+# RUN: %if ptxas %{ llc -mcpu=sm_20 %t-cuda.ll -verify-machineinstrs -o - | %ptxas-verify %}
----------------
asavonic wrote:
> tra wrote:
> > pavelkopyl wrote:
> > > tra wrote:
> > > > We may as well change it to `sm_30`, too. sm_20 is gone for all practical purposes. Even sm_30 is on the way out.
> > > Changing version to sm_30 also changes the generated code. The reason is that sm_20 (Fermi) has no support of image handles, as a result nvptx backed runs special pass NVPTXReplaceImageHandles to workaround this.
> > > Textually such a code diff looks following:
> > > 
> > > 
> > > ```
> > > .global .surfref gsurf;
> > > ```
> > > sm_20:
> > > 
> > > ```
> > > suld.b.1d.b8.trap {%rs1}, [gsurf, {%r1}];
> > > ```
> > > 
> > > sm_30:
> > > ```
> > > mov.u64 %rd3, gsurf;
> > > suld.b.1d.b8.trap {%rs1}, [%rd3, {%r1}];
> > > ```
> > > 
> > > We can change this test to support sm_30. But I think it's better to do this in another review. Is that OK?
> > OK. We can keep sm_20 test around for now. 
> > The reason is that sm_20 (Fermi) has no support of image handles, as a result nvptx backed runs special pass NVPTXReplaceImageHandles to workaround this.
> 
> Can we run NVPTXReplaceImageHandles for other targets as well? These extra `mov` instructions are unnecessary, and the pass can eliminate them. 
The moves do not matter all that much. ptxas is an optimizing assembler, which can deal with them. If we really want to get rid of them, a better way would be not to generate them which should be done during lowering, not by running an additional pass.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141054/new/

https://reviews.llvm.org/D141054