[llvm-bugs] [Bug 33345] New: Induction on fast fdiv cannot be vectorized

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jun 7 09:55:09 PDT 2017


https://bugs.llvm.org/show_bug.cgi?id=33345

            Bug ID: 33345
           Summary: Induction on fast fdiv cannot be vectorized
           Product: libraries
           Version: 4.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: yyc1992 at gmail.com
                CC: llvm-bugs at lists.llvm.org

Similar to https://bugs.llvm.org//show_bug.cgi?id=33338 but with fdiv.

The IR to reproduce is

```
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind readonly uwtable
define double @f(double* nocapture readonly, double* nocapture readonly, i32)
local_unnamed_addr #0 {
  %4 = icmp sgt i32 %2, 0
  br i1 %4, label %5, label %.loopexit

; <label>:5:                                      ; preds = %3
  %6 = zext i32 %2 to i64
  br label %8

.loopexit.loopexit:                               ; preds = %8
  br label %.loopexit

.loopexit:                                        ; preds =
%.loopexit.loopexit, %3
  %7 = phi double [ 0.000000e+00, %3 ], [ %13, %.loopexit.loopexit ]
  ret double %7

; <label>:8:                                      ; preds = %8, %5
  %9 = phi i64 [ %14, %8 ], [ 0, %5 ]
  %10 = phi double [ %13, %8 ], [ 0.000000e+00, %5 ]
  %11 = getelementptr inbounds double, double* %0, i64 %9
  %12 = load double, double* %11, align 8
  ; %r = fdiv fast double 1.0, %12
  ; %13 = fmul fast double %10, %r
  %13 = fdiv fast double %10, %12
  %14 = add nuw nsw i64 %9, 1
  %15 = icmp eq i64 %14, %6
  br i1 %15, label %.loopexit.loopexit, label %8
}

attributes #0 = { nounwind readonly uwtable "target-features"="+fma" }
```

The commented out version vectorizes whereas the version without calculating
the reciprocal doesn't. Since `fast` implies `arcp` the transformation from one
to the other should be allowed and both should be vectorizable.

As a bonus point, the original version should not need to actually calculating
the reciprocal first. It should be able to only do one vector `fdiv` in the
loop and multiply them together afterwards.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170607/f630442c/attachment-0001.html>


More information about the llvm-bugs mailing list