[PATCH] D156301: [WIP] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Pravin Jagtap via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jul 29 23:56:56 PDT 2023
pravinjagtap added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:219
+ // TODO: Support for double type
+ if (!isScanStrategyIterative() || I.getType()->isDoubleTy()) {
+ return;
----------------
pravinjagtap wrote:
> pravinjagtap wrote:
> > arsenm wrote:
> > > I think this is a bad interpretation of the strategy option. Doing nothing just because you wanted something else is worse than just using an implemented path. Also you can just implement this with dpp?
> > > Also you can just implement this with dpp?
> >
> > If I understand correctly, current dpp intrinsics that we need for reduction & scan(`llvm.amdgcn.update.dpp`) can return only `integer` types (accepts inputs with any types). @foad Is it possible to extend current dpp implementation for float types as well ?
> > > Also you can just implement this with dpp?
> >
> > If I understand correctly, current dpp intrinsics that we need for reduction & scan(`llvm.amdgcn.update.dpp`) can return only `integer` types (accepts inputs with any types).
>
> I am wrong, this intrinsic is lowered to V_MOV_B32_dpp when matched with i32 types. I think, we should be able to implement dpp for floats with bitcasts noise.
I am able to generate functionally correct code for scan with DPP strategy but it needs lot of bitcast mess for `llvm.amdgcn.set.inactive.i32` and `lvm.amdgcn.update.dpp.i32`. Is there any better way of doing this ?
```
%16 = bitcast float %9 to i32
%17 = call i32 @llvm.amdgcn.set.inactive.i32(i32 %16, i32 0)
%18 = bitcast i32 %17 to float
%19 = bitcast i32 %16 to float
%20 = bitcast float %18 to i32
%21 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %20, i32 273, i32 15, i32 15, i1 false)
%22 = bitcast i32 %21 to float
%23 = bitcast i32 %20 to float
%24 = fadd float %23, %22
%25 = bitcast float %24 to i32
%26 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %25, i32 274, i32 15, i32 15, i1 false)
%27 = bitcast i32 %26 to float
%28 = bitcast i32 %25 to float
%29 = fadd float %28, %27
%30 = bitcast float %29 to i32
%31 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %30, i32 276, i32 15, i32 15, i1 false)
%32 = bitcast i32 %31 to float
%33 = bitcast i32 %30 to float
%34 = fadd float %33, %32
%35 = bitcast float %34 to i32
%36 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %35, i32 280, i32 15, i32 15, i1 false)
%37 = bitcast i32 %36 to float
%38 = bitcast i32 %35 to float
%39 = fadd float %38, %37
%40 = bitcast float %39 to i32
%41 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %40, i32 322, i32 10, i32 15, i1 false)
%42 = bitcast i32 %41 to float
%43 = bitcast i32 %40 to float
%44 = fadd float %43, %42
%45 = bitcast float %44 to i32
%46 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %45, i32 323, i32 12, i32 15, i1 false)
%47 = bitcast i32 %46 to float
%48 = bitcast i32 %45 to float
%49 = fadd float %48, %47
%50 = bitcast float %49 to i32
%51 = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %50, i32 312, i32 15, i32 15, i1 false)
%52 = bitcast i32 %51 to float
%53 = bitcast float %49 to i32
%54 = call i32 @llvm.amdgcn.readlane(i32 %53, i32 63)
%55 = bitcast i32 %54 to float
%56 = call float @llvm.amdgcn.strict.wwm.f32(float %55)
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156301/new/
https://reviews.llvm.org/D156301
More information about the llvm-commits
mailing list