[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions

Thu Aug 8 09:58:18 PDT 2019

I think that it's certainly worth posting a patch and then we can evaluate it.

Thanks again,

Hal

Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

________________________________
From: Qiu Chaofan <qcf.ibm at outlook.com>
Sent: Thursday, August 8, 2019 11:47 AM
To: Finkel, Hal J. <hfinkel at anl.gov>; llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: 回复: [llvm-dev] [RFC] Improve iteration of estimating divisions

Hal,

Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master:

```
float fdiv(unsigned int a, unsigned int b) {
  return (float)a / (float)b;
}

float m;

__attribute__((noinline)) void foo() {
  m = 0.0;
}

int main() {
  for (int i = 1; i < 1000000; ++i)
    for (int j = 1; j < 30000; ++j) {
      m = fdiv(i, j);
      foo();
    }
}
```

Regards,
Qiu Chaofan

________________________________________
发件人: Finkel, Hal J. <hfinkel at anl.gov>
发送时间: 2019年8月7日 4:04
收件人: 邱 超凡; llvm-dev at lists.llvm.org
主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions

On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:
> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC.  Like this case in ppc64le:
>
>          float fdiv(unsigned int a, unsigned int b) {
>                  return (float)a / (float)b;
>          }
>
> Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened.
>
> Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`.  But if we put the operand `a` into iterations in the estimate function, the result would be better.
>
> Patching such a change may break several existing test cases in different platforms since it’s target-independent code.  So any suggestions are welcome.  Thanks.

Test cases can be changed if the result is universally better, and
alternatively, we can introduce a way for the target to control the
behavior (e.g., how we choose between buildSqrtNROneConst and
buildSqrtNRTwoConst). What's the effect on performance?

  -Hal

>
> Regards,
> Qiu Chaofan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190808/9597e3cc/attachment.html>