[all-commits] [llvm/llvm-project] e95408: AMDGPU: Fix more unsafe rsq formation
Matt Arsenault via All-commits
all-commits at lists.llvm.org
Wed Aug 23 16:29:04 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e954085f80d2146e7bbb8e3c50f8bfa455eb4e32
https://github.com/llvm/llvm-project/commit/e954085f80d2146e7bbb8e3c50f8bfa455eb4e32
Author: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: 2023-08-23 (Wed, 23 Aug 2023)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
M llvm/lib/Target/AMDGPU/SIISelLowering.h
M llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir
M llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
M llvm/test/CodeGen/AMDGPU/fdiv.f16.ll
M llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll
M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.rcp.ll
M llvm/test/CodeGen/AMDGPU/rsq.f32.ll
Log Message:
-----------
AMDGPU: Fix more unsafe rsq formation
Introducing rsq contract flags is wrong, and also requires some level
of approximate functions. AMDGPUCodeGenPrepare already should handle
the f32 cases with appropriate flags, and I don't see how new
situations to handle would arise during legalization (other than cases
involving the rcp intrinsic, which instcombine tries to
handle). AMDGPUCodeGenPrepare does need to learn better handling of
rcp/rsq for f64 though, which we never bothered to handle well.
Removes another obstacle to correctly lowering sqrt.
https://reviews.llvm.org/D158099
More information about the All-commits
mailing list