[llvm-bugs] [Bug 46212] New: [AMDGPU] integer division off-by-one for some inputs
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Jun 4 17:03:40 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=46212
Bug ID: 46212
Summary: [AMDGPU] integer division off-by-one for some inputs
Product: libraries
Version: trunk
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: AMDGPU
Assignee: unassignedbugs at nondot.org
Reporter: tilkax at gmail.com
CC: llvm-bugs at lists.llvm.org
For this IR:
define i32 @divide(i32, i32) {
%3 = udiv i32 %0, %1
ret i32 %3
}
LLVM generates this code (-mcpu=gfx1010):
v_cvt_f32_u32_e32 v2, v1
v_rcp_f32_e32 v2, v2
s_mov_b32 s4, 0x4f800000
v_mul_f32_e32 v2, s4, v2
v_cvt_u32_f32_e32 v2, v2
v_mul_hi_u32 v3, v2, v1
v_mul_lo_u32 v4, v2, v1
s_mov_b32 s4, 0
v_sub_nc_u32_e32 v5, s4, v4
v_cmp_eq_u32_e64 s4, s4, v3
v_cndmask_b32_e64 v3, v4, v5, s4
v_mul_hi_u32 v3, v3, v2
v_add_nc_u32_e32 v4, v2, v3
v_sub_nc_u32_e32 v2, v2, v3
v_cndmask_b32_e64 v2, v2, v4, s4
v_mul_hi_u32 v2, v2, v0
v_mul_lo_u32 v3, v2, v1
v_sub_nc_u32_e32 v4, v0, v3
v_cmp_ge_u32_e64 s4, v4, v1
v_cmp_ge_u32_e64 s5, v0, v3
s_and_b32 s4, s4, s5
s_mov_b32 s6, 1
v_add_nc_u32_e32 v0, s6, v2
s_mov_b32 s6, -1
v_add_nc_u32_e32 v1, s6, v2
v_cndmask_b32_e64 v0, v2, v0, s4
v_cndmask_b32_e64 v0, v1, v0, s5
For similar HLSL source, the AMD DirectX driver generates better code but uses
the same algorithm:
v_cvt_f32_u32 v3, v2 // 000000000018:
7E060D02
v_rcp_f32 v3, v3 // 00000000001C:
7E065503
v_mul_f32 v3, lit(0x4f800000), v3 // 000000000020:
100606FF 4F800000
v_cvt_u32_f32 v3, v3 // 000000000028:
7E060F03
v_mad_u64_u32 v[4:5], vcc, v2, v3, 0 // 00000000002C:
D5766A04 02020702
v_cmp_ne_i32 s[0:1], 0, v5 // 000000000034:
7D0A0AF9 06868080
v_sub_co_u32 v6, vcc, 0, v4 // 00000000003C:
D7106A06 00020880
v_cndmask_b32 v4, v6, v4, s[0:1] // 000000000044:
D5010004 00020906
v_mul_hi_u32 v4, v4, v3 // 00000000004C:
D56A0004 00020704
v_sub_co_u32 v5, vcc, v3, v4 // 000000000054:
D7106A05 00020903
v_add_nc_u32 v4, v3, v4 // 00000000005C:
4A080903
v_cndmask_b32 v1, v4, v5, s[0:1] // 000000000060:
D5010001 00020B04
v_mul_hi_u32 v1, v1, v0 // 000000000068:
D56A0001 00020101
v_mul_lo_u32 v3, v1, v2 // 000000000070:
D5690003 00020501
v_sub_nc_u32 v5, v0, v3 // 000000000078:
4C0A0700
v_cmp_ge_u32 s[0:1], v0, v3 // 00000000007C:
7D8C06F9 06068000
v_cmp_ge_u32 s[2:3], v5, v2 // 000000000084:
7D8C04F9 06068205
s_and_b64 s[2:3], s[0:1], s[2:3] // 00000000008C:
87820200
v_add_co_ci_u32 v0, vcc, v1, 0, s[2:3] // 000000000090:
D5286A00 00090101
v_add_co_ci_u32 v0, vcc, v0, -1, s[0:1] // 000000000098:
D5286A00 00018300
v_cmp_ne_i32 vcc, 0, v2 // 0000000000A0:
7D0A0480
v_cndmask_b32 v0, -1, v0, vcc // 0000000000A4:
020000C1
Using PIX for Windows, I verified that for 0xFFFFFFFF / 0x11111111 this code
returns 14.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200605/6d4bbf8a/attachment-0001.html>
More information about the llvm-bugs
mailing list