[PATCH] D59556: [AMDGPU] Fixed i64 add/sub used in lowering of i64 srem

Wed Mar 20 07:55:49 PDT 2019

arsenm added a comment.

In D59556#1436285 <https://reviews.llvm.org/D59556#1436285>, @tpr wrote:

> The test is already reduced as much as I can. Removing anything in there makes the problem disappear. Constructing a new test case using llvm.uadd.with.overflow does not show the problem. Can we go with this test case?

I managed with this:

  define amdgpu_kernel void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr, float %dummy.val) #0 {
    %tid = call i32 @llvm.amdgcn.workitem.id.x()
    %tid.ext = sext i32 %tid to i64
    %a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr
    %b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr
    %a = load volatile i32, i32 addrspace(1)* %a.gep, align 4
    %b = load volatile i32, i32 addrspace(1)* %b.gep, align 4
    %uadd0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
    %val0 = extractvalue { i32, i1 } %uadd0, 0
    %carry0 = extractvalue { i32, i1 } %uadd0, 1
    store volatile i32 %val0, i32 addrspace(1)* %out, align 4
    store i1 %carry0, i1 addrspace(1)* %carryout

    ; Force a use of an i1 0 that will be materialized in a register,
    ; which will be selected before the uaddo (so its operand is
    ; repalced with the materialized node)
    %fmas = call float @llvm.amdgcn.div.fmas.f32(float %dummy.val, float %dummy.val, float %dummy.val, i1 false)
    store volatile float %fmas, float addrspace(1)* null
    ret void
  }

  declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1)
  declare i32 @llvm.amdgcn.workitem.id.x() #1
  declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1

  attributes #0 = { nounwind }
  attributes #1 = { nounwind readnone }

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59556/new/

https://reviews.llvm.org/D59556