[all-commits] [llvm/llvm-project] e95408: AMDGPU: Fix more unsafe rsq formation

Matt Arsenault via All-commits all-commits at lists.llvm.org
Wed Aug 23 16:29:04 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e954085f80d2146e7bbb8e3c50f8bfa455eb4e32
      https://github.com/llvm/llvm-project/commit/e954085f80d2146e7bbb8e3c50f8bfa455eb4e32
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2023-08-23 (Wed, 23 Aug 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/lib/Target/AMDGPU/SIISelLowering.h
    M llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir
    M llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
    M llvm/test/CodeGen/AMDGPU/fdiv.f16.ll
    M llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.rcp.ll
    M llvm/test/CodeGen/AMDGPU/rsq.f32.ll

  Log Message:
  -----------
  AMDGPU: Fix more unsafe rsq formation

Introducing rsq contract flags is wrong, and also requires some level
of approximate functions. AMDGPUCodeGenPrepare already should handle
the f32 cases with appropriate flags, and I don't see how new
situations to handle would arise during legalization (other than cases
involving the rcp intrinsic, which instcombine tries to
handle). AMDGPUCodeGenPrepare does need to learn better handling of
rcp/rsq for f64 though, which we never bothered to handle well.

Removes another obstacle to correctly lowering sqrt.

https://reviews.llvm.org/D158099




More information about the All-commits mailing list