[PATCH] D28719: [NVPTX] Improve lowering of llvm.ctlz.
Justin Lebar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 17 14:28:22 PST 2017
jlebar marked an inline comment as done.
jlebar added inline comments.
================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:2633
+def : Pat<(i32 (zext (ctlz Int16Regs:$a))),
+ (SUBi32ri (CLZr32 (CVT_u32_u16 Int16Regs:$a, CvtNONE)), 16)>;
----------------
tra wrote:
> PTX has `mov.b32 %dest, {%src1, %src2}`
> Instead of explicit conversion + subtracting 16, perhaps we could do something like this:
> ```
> mov.b32 %t, {%src, 0xffff}
> clz.b32 %result, %t
> ```
> I'm not sure whether it makes any difference in SASS, though.
Oh, that is sneaky. I like it. It is one less SASS instruction.
Orig:
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
/*0010*/ LDC.U16 R0, c[0x0][0x140]; /* 0x7c900000a01ffc02 */
/*0018*/ FLO.U32 R0, R0; /* 0xe1800000001c0002 */
/*0020*/ ISUB R0, 0x1f, R0; /* 0xc09000000f9c0001 */
/*0028*/ I2I.U16.U32 R2, R0; /* 0xe6000000001c240a */
/*0030*/ MOV R0, c[0x0][0x144]; /* 0x64c03c00289c0002 */
/*0038*/ IADD R2, R2, -0x10; /* 0xc88003fff81c0809 */
/* 0x080000000000b810 */
/*0048*/ ST.U16 [R0], R2; /* 0xe2000000001c0008 */
/*0050*/ EXIT; /* 0x18000000001c003c */
Clever hack:
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
/*0010*/ LDC.U16 R0, c[0x0][0x140]; /* 0x7c900000a01ffc02 */
/*0018*/ ISCADD R0, R0, 0xffff, 0x10; /* 0xc0c0407fff9c0001 */
/*0020*/ FLO.U32 R2, R0; /* 0xe1800000001c000a */
/*0028*/ MOV R0, c[0x0][0x144]; /* 0x64c03c00289c0002 */
/*0030*/ ISUB R2, 0x1f, R2; /* 0xc09000000f9c0809 */
/*0038*/ ST.U16 [R0], R2; /* 0xe2000000001c0008 */
/* 0x08000000000000b8 */
/*0048*/ EXIT; /* 0x18000000001c003c */
However, we don't currently have a mechanism to generate `mov.b32 b32reg, {imm, b16reg}`. If it's OK with you, I'll just leave a TODO. Clever as it is, I seriously doubt it will ever matter.
https://reviews.llvm.org/D28719
More information about the llvm-commits
mailing list