[PATCH] D59556: [AMDGPU] Fixed i64 add/sub used in lowering of i64 srem
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 20 07:55:49 PDT 2019
arsenm added a comment.
In D59556#1436285 <https://reviews.llvm.org/D59556#1436285>, @tpr wrote:
> The test is already reduced as much as I can. Removing anything in there makes the problem disappear. Constructing a new test case using llvm.uadd.with.overflow does not show the problem. Can we go with this test case?
I managed with this:
define amdgpu_kernel void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr, float %dummy.val) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%tid.ext = sext i32 %tid to i64
%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr
%b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr
%a = load volatile i32, i32 addrspace(1)* %a.gep, align 4
%b = load volatile i32, i32 addrspace(1)* %b.gep, align 4
%uadd0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
%val0 = extractvalue { i32, i1 } %uadd0, 0
%carry0 = extractvalue { i32, i1 } %uadd0, 1
store volatile i32 %val0, i32 addrspace(1)* %out, align 4
store i1 %carry0, i1 addrspace(1)* %carryout
; Force a use of an i1 0 that will be materialized in a register,
; which will be selected before the uaddo (so its operand is
; repalced with the materialized node)
%fmas = call float @llvm.amdgcn.div.fmas.f32(float %dummy.val, float %dummy.val, float %dummy.val, i1 false)
store volatile float %fmas, float addrspace(1)* null
ret void
}
declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1)
declare i32 @llvm.amdgcn.workitem.id.x() #1
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1
attributes #0 = { nounwind }
attributes #1 = { nounwind readnone }
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59556/new/
https://reviews.llvm.org/D59556
More information about the llvm-commits
mailing list