[PATCH] D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+.

Tue Mar 19 11:27:35 PDT 2019

tra marked an inline comment as done.
tra added inline comments.

================
Comment at: llvm/test/CodeGen/NVPTX/wmma.py:244
+
 main()
----------------
timshen wrote:
> tra wrote:
> > timshen wrote:
> > > timshen wrote:
> > > > Who is supposed to run this script? Can we check-in the result of this script and make them part of the regression tests?
> > > > 
> > > > Relatedly, for other backends we have a framework for it. See `llvm/utils/update_llc_test_checks.py`. The generated file looks like `llvm/test/CodeGen/PowerPC/atomics-regression.ll`.
> > > > 
> > > > One of the advantages to check-in the generated file is that, and succeeding behavioral changes are reflected in the patch.
> > > > Who is supposed to run this script?
> > > 
> > > I guess I can answer this part - lit. Still, it'd be great to check-in the generated .ll files with RUN lines in them.
> > The script is executed by the lit which then runs llc with the generated output and checks the resulting PTX.
> > 
> > I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans.
> > 
> > e.g just one function out of *a lot*:
> > 
> > ```
> > declare {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );
> > 
> > ; CHECK-LABEL: .func {{.*}}test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(
> > define {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(i8 addrspace(3)* %src ) {
> > ; CHECK: wmma.load.a.sync.aligned.row.m16n16k16.shared.f16
> > ; CHECK: {{{%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+}}}
> > ; CHECK: [%rd{{[0-9]+}}]
> >   %v0 = call {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );
> >   ret {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} %v0;
> > }
> > 
> > ```
> > 
> > It's easy enough to generate or grab the one done by lit -- the name would be right there in the failing command.
> > I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans.
> 
> test/CodeGen/X86 already does this:
> ...src/llvm-project/llvm % wc -c `grep 'autogenerated by utils/update_llc' -r test/CodeGen/X86 -l` | tail -1
> 37287954 total
> 
> It has 37MB of autogenerated .ll files, presumably for its massive intrinsics.
> 
> > e.g just one function out of *a lot*:
> 
> Compared to `test/CodeGen/X86/avx512vl-vec-masked-cmp.ll` it isn't that bad. ;)
> 
> 
> However, I realized that `wmma.py` is somewhat different from `utils/update_llc_test_checks.py`
> 
> What `utils/update_llc_test_checks.py` does:
> * Run llc on the *arbitrary* input IR and get the asm output.
> * Use regex replacement to turn the asm into CHECK lines. The regexes are different for different targets.
> * Print out the .ll file with those CHECK lines.
> 
> What `wmma.py` does:
> * Enumerate all possible combinations of wmma IR inputs.
> * Generate the CHECK lines directly using the same wmma-specific knowledge that generates the IR.
> * Print out the .ll file with the CHECK lines.
> 
> The key difference is that `update_llc_test_checks.py` won't be wmma-specific.
> 
> Another crucial difference is that wmma.py generates very generic check-lines like `[%rd{{[0-9]+}}]`, while `update_llc_test_checks.py` usually prints out the exact literal it extracts from the asm result, e.g. `%rd1`.
> 
> As a result, wmma.py's output isn't as readable as I thought it would be (less literals), so I'm fine without checking-in the wppa.py-generated files.
> 
> However, I encourage that some of the NVPTX contributors (!) add NVPTX support to `update_llc_test_checks.py`. With that, we could have supported wmma.py almost freely, along with all other kinds of PTX regression tests.
IIUIC,  `update_llc_test_checks.py` effectively freezes the output generated by llcn *now* so it can be checked for regressions later.

wmma.py use case is different, at least for me -- I use it as a way to *create* the reference output that llc can't generate yet and then use it to make sure my NVPTX back-end changes do the right thing. 
That said, once the back-end functionality is implemented, it becomes just a 'compare to the reference' test and the task of generating CHECK lines can be indeed offloaded to `update_llc_test_checks.py`.

I'll think of splitting these two use cases. Perhaps I should keep the script to aid with development, but, once it's done, generate reference .ll with implemented intrinsics and let `update_llc_test_checks.py` generate the checks for generated PTX.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59393/new/

https://reviews.llvm.org/D59393