[PATCH] D44102: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.

Justin Lebar via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 5 12:21:30 PST 2018


jlebar added a comment.

Disappointingly, this doesn't work for simple cases where you mask the divisor:

  %b = and i64 %a, 65535
  %div = udiv i64 %b, 42

It does work for llvm.assume, which I guess is good enough for the specific case I have, but...maybe this is not the right pass to be doing this in?  Or should I check known-bits here too?  Sorry, I'm an ignoramus when it comes to the target-independent parts of LLVM.

  target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
  target triple = "nvptx64-nvidia-cuda"
  
  declare void @llvm.assume(i1)
  
  define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
    %cond = icmp ult i64 %a, 1024
    call void @llvm.assume(i1 %cond)
    %div = udiv i64 %a, 42
    %rem = urem i64 %a, 42
    store i64 %div, i64* %ptr1
    store i64 %rem, i64* %ptr2
    ret void
  }

becomes, at `opt -O2`

  define void @foo(i64 %a, i64* nocapture %ptr1, i64* nocapture %ptr2) local_unnamed_addr #0 {
    %cond = icmp ult i64 %a, 1024
    tail call void @llvm.assume(i1 %cond)
    %div.lhs.trunc = trunc i64 %a to i16
    %div1 = udiv i16 %div.lhs.trunc, 42
    %div.zext = zext i16 %div1 to i64
    %1 = mul i16 %div1, 42
    %2 = sub i16 %div.lhs.trunc, %1
    %rem.zext = zext i16 %2 to i64
    store i64 %div.zext, i64* %ptr1, align 8
    store i64 %rem.zext, i64* %ptr2, align 8
    ret void
  }

which lowers to the following ptx:

  shr.u16         %rs2, %rs1, 1;
  mul.wide.u16    %r1, %rs2, -15603;
  shr.u32         %r2, %r1, 20;
  cvt.u16.u32     %rs3, %r2;
  cvt.u64.u32     %rd3, %r2;
  mul.lo.s16      %rs4, %rs3, 42;
  sub.s16         %rs5, %rs1, %rs4;
  cvt.u64.u16     %rd4, %rs5;
  st.u64  [%rd1], %rd3;
  st.u64  [%rd2], %rd4;

This is even nicer than before because we do the magic-number division in 16-widens-to-32-bit instead of (before) doing it in 32-widens-to-64 bit.  At least, I hope that's efficient in NVPTX -- if not, that's our backend's problem.  :)


https://reviews.llvm.org/D44102





More information about the llvm-commits mailing list