[PATCH] D71128: [NVPTX][FIX] Expand atomics we cannot handle natively in the ISA

Mon Dec 9 13:40:22 PST 2019

jdoerfert marked 5 inline comments as done.
jdoerfert added a comment.

I will try to look into the problematic lowerings @tra pointed out (thanks btw!). Any hints to why they are expanded are appreciated :)

================
Comment at: llvm/test/Transforms/AtomicExpand/NVPTX/expand-atomic-rmw-fadd.ll:16
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast float [[LOADED]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = cmpxchg i32* [[TMP2]], i32 [[TMP4]], i32 [[TMP3]] seq_cst seq_cst
+; CHECK-NEXT:    [[SUCCESS:%.*]] = extractvalue { i32, i1 } [[TMP5]], 1
----------------
tra wrote:
> Don't we want to preserve `atomicrmw fadd` in this case and lower it to `atom.add.f32` ? Why do we want to expand here?
Same as below.

================
Comment at: llvm/test/Transforms/AtomicExpand/NVPTX/expand-atomic-rmw-fadd.ll:131
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast double [[LOADED]] to i64
+; CHECK-NEXT:    [[TMP5:%.*]] = cmpxchg i64* [[TMP2]], i64 [[TMP4]], i64 [[TMP3]] seq_cst seq_cst
+; CHECK-NEXT:    [[SUCCESS:%.*]] = extractvalue { i64, i1 } [[TMP5]], 1
----------------
tra wrote:
> Ditto here and below. We do have `atom.add.f64`
To be honest, I don't even know why we do not match it. All I (tried) to do is add the limits wrt. size and alignment. Somehow that had more effect than I wanted. The new hooks already remove some of the weirdness we saw but it seems something is missing here (maybe during the instruction "registration").

================
Comment at: llvm/test/Transforms/AtomicExpand/NVPTX/expand-atomic-rmw-fsub.ll:5
+
+define float @test_atomicrmw_fadd_f32_flat(float* %ptr, float %value) {
+; CHECK-LABEL: @test_atomicrmw_fadd_f32_flat(
----------------
tra wrote:
> Functilon name `fadd` does not seem to match the instruction `fsub`. 
Good catch, copy & paste from the AMD tests ;)  (@arsenm 

================
Comment at: llvm/test/Transforms/AtomicExpand/NVPTX/expand-atomic-rmw-fsub.ll:15
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast float [[LOADED]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = cmpxchg i32* [[TMP2]], i32 [[TMP4]], i32 [[TMP3]] seq_cst seq_cst
+; CHECK-NEXT:    [[SUCCESS:%.*]] = extractvalue { i32, i1 } [[TMP5]], 1
----------------
tra wrote:
> I must be missing something -- I would think that we do *not* want to expand atomicrmw variants which we can lower to an existing instruction, but a lot of the tests show the opposite and expand atomics that have direct support in hardware. The patch subject seems to agree with my assumptions, but the tests appear to contradict it. Is that intentional? If so, what is it that I'm missing? 
It is not intentional to pesimise anything, as mentioned above. The problem is I am neither a backend nor NVPTX person and my changes do seem to have unwanted effects I cannot even categorize.

================
Comment at: llvm/test/Transforms/AtomicExpand/NVPTX/unaligned-atomic.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -mtriple=nvptx64-unknown-unknown -atomic-expand %s | FileCheck -check-prefix=CHECK %s
+
----------------
tra wrote:
> Nit: no need for `-check-prefix` as you only using `CHECK` in the test.
Fair, I think I copied this ;)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71128/new/

https://reviews.llvm.org/D71128