[PATCH] D28719: [NVPTX] Improve lowering of llvm.ctlz.

Justin Lebar via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 14:28:22 PST 2017


jlebar marked an inline comment as done.
jlebar added inline comments.


================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:2633
+def : Pat<(i32 (zext (ctlz Int16Regs:$a))),
+          (SUBi32ri (CLZr32 (CVT_u32_u16 Int16Regs:$a, CvtNONE)), 16)>;
 
----------------
tra wrote:
> PTX has `mov.b32 %dest, {%src1, %src2}`
> Instead of explicit conversion + subtracting 16, perhaps we could do something like this:
> ```
> mov.b32 %t, {%src, 0xffff}
> clz.b32 %result, %t
> ```
> I'm not sure whether it makes any difference in SASS, though.
Oh, that is sneaky.  I like it.  It is one less SASS instruction.

Orig:

        /*0008*/                   MOV R1, c[0x0][0x44];       /* 0x64c03c00089c0006 */
        /*0010*/                   LDC.U16 R0, c[0x0][0x140];  /* 0x7c900000a01ffc02 */
        /*0018*/                   FLO.U32 R0, R0;             /* 0xe1800000001c0002 */
        /*0020*/                   ISUB R0, 0x1f, R0;          /* 0xc09000000f9c0001 */
        /*0028*/                   I2I.U16.U32 R2, R0;         /* 0xe6000000001c240a */
        /*0030*/                   MOV R0, c[0x0][0x144];      /* 0x64c03c00289c0002 */
        /*0038*/                   IADD R2, R2, -0x10;         /* 0xc88003fff81c0809 */
                                                               /* 0x080000000000b810 */
        /*0048*/                   ST.U16 [R0], R2;            /* 0xe2000000001c0008 */
        /*0050*/                   EXIT;                       /* 0x18000000001c003c */

Clever hack:

        /*0008*/                   MOV R1, c[0x0][0x44];         /* 0x64c03c00089c0006 */
        /*0010*/                   LDC.U16 R0, c[0x0][0x140];    /* 0x7c900000a01ffc02 */
        /*0018*/                   ISCADD R0, R0, 0xffff, 0x10;  /* 0xc0c0407fff9c0001 */
        /*0020*/                   FLO.U32 R2, R0;               /* 0xe1800000001c000a */
        /*0028*/                   MOV R0, c[0x0][0x144];        /* 0x64c03c00289c0002 */
        /*0030*/                   ISUB R2, 0x1f, R2;            /* 0xc09000000f9c0809 */
        /*0038*/                   ST.U16 [R0], R2;              /* 0xe2000000001c0008 */
                                                                 /* 0x08000000000000b8 */
        /*0048*/                   EXIT;                         /* 0x18000000001c003c */

However, we don't currently have a mechanism to generate `mov.b32 b32reg, {imm, b16reg}`.  If it's OK with you, I'll just leave a TODO.  Clever as it is, I seriously doubt it will ever matter.


https://reviews.llvm.org/D28719





More information about the llvm-commits mailing list