[PATCH] D156301: [WIP] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Sat Jul 29 23:56:56 PDT 2023

pravinjagtap added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:219
+    // TODO: Support for double type
+    if (!isScanStrategyIterative() || I.getType()->isDoubleTy()) {
+      return;
----------------
pravinjagtap wrote:
> pravinjagtap wrote:
> > arsenm wrote:
> > > I think this is a bad interpretation of the strategy option. Doing nothing just because you wanted something else is worse than just using an implemented path. Also you can just implement this with dpp?
> > > Also you can just implement this with dpp?
> > 
> > If I understand correctly, current dpp intrinsics that we need for reduction & scan(`llvm.amdgcn.update.dpp`) can return only `integer` types (accepts inputs with any types). @foad Is it possible to extend current dpp implementation for float types as well ? 
> > > Also you can just implement this with dpp?
> > 
> > If I understand correctly, current dpp intrinsics that we need for reduction & scan(`llvm.amdgcn.update.dpp`) can return only `integer` types (accepts inputs with any types). 
> 
> I am wrong, this intrinsic is lowered to V_MOV_B32_dpp when matched with i32 types. I think, we should be able to implement dpp for floats with bitcasts noise.
I am able to generate functionally correct code for scan with DPP strategy but it needs lot of bitcast mess for `llvm.amdgcn.set.inactive.i32` and `lvm.amdgcn.update.dpp.i32`. Is there any better way of doing this ? 

```
  %16 = bitcast float %9 to i32
  %17 = call i32 @llvm.amdgcn.set.inactive.i32(i32 %16, i32 0)
  %18 = bitcast i32 %17 to float
  %19 = bitcast i32 %16 to float
  %20 = bitcast float %18 to i32
  %21 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %20, i32 273, i32 15, i32 15, i1 false)
  %22 = bitcast i32 %21 to float
  %23 = bitcast i32 %20 to float
  %24 = fadd float %23, %22
  %25 = bitcast float %24 to i32
  %26 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %25, i32 274, i32 15, i32 15, i1 false)
  %27 = bitcast i32 %26 to float
  %28 = bitcast i32 %25 to float
  %29 = fadd float %28, %27
  %30 = bitcast float %29 to i32
  %31 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %30, i32 276, i32 15, i32 15, i1 false)
  %32 = bitcast i32 %31 to float
  %33 = bitcast i32 %30 to float
  %34 = fadd float %33, %32
  %35 = bitcast float %34 to i32
  %36 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %35, i32 280, i32 15, i32 15, i1 false)
  %37 = bitcast i32 %36 to float
  %38 = bitcast i32 %35 to float
  %39 = fadd float %38, %37
  %40 = bitcast float %39 to i32
  %41 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %40, i32 322, i32 10, i32 15, i1 false)
  %42 = bitcast i32 %41 to float
  %43 = bitcast i32 %40 to float
  %44 = fadd float %43, %42
  %45 = bitcast float %44 to i32
  %46 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %45, i32 323, i32 12, i32 15, i1 false)
  %47 = bitcast i32 %46 to float
  %48 = bitcast i32 %45 to float
  %49 = fadd float %48, %47
  %50 = bitcast float %49 to i32
  %51 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %50, i32 312, i32 15, i32 15, i1 false)
  %52 = bitcast i32 %51 to float
  %53 = bitcast float %49 to i32
  %54 = call i32 @llvm.amdgcn.readlane(i32 %53, i32 63)
  %55 = bitcast i32 %54 to float
  %56 = call float @llvm.amdgcn.strict.wwm.f32(float %55)
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156301/new/

https://reviews.llvm.org/D156301