[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Thu Sep 8 10:27:34 PDT 2011

On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote:
> Peter,
>  Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag.

There are two main reasons why I think we shouldn't go for global
flags:

1) It becomes difficult if not impossible to correctly link together
   modules with different accuracy requirements, especially if LTO
   is done on the combined module.

2) Some LLVM optimisations will create operations with a accuracy
   requirement different from the language specified accuracy.
   For example, consider the following OpenCL kernel:

-----
#pragma OPENCL EXTENSION cl_khr_fp64: enable

__kernel void dpdiv(__global float *result, float x, float y) {
  *result = (double) x / (double) y;
}
-----

When compiled to LLVM with optimisations turned off, the function
looks like this:

-----
define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable {
entry:
  %result.addr = alloca float*, align 8
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  store float* %result, float** %result.addr, align 8
  store float %x, float* %x.addr, align 4
  store float %y, float* %y.addr, align 4
  %tmp = load float* %x.addr, align 4
  %conv = fpext float %tmp to double
  %tmp1 = load float* %y.addr, align 4
  %conv2 = fpext float %tmp1 to double
  %div = fdiv double %conv, %conv2
  %conv3 = fptrunc double %div to float
  %tmp4 = load float** %result.addr, align 8
  store float %conv3, float* %tmp4
  ret void
}
-----

With optimisations turned on:

-----
define void @dpdiv(float* nocapture %result, float %x, float %y) nounwind uwtable {
entry:
  %conv3 = fdiv float %x, %y
  store float %conv3, float* %result, align 4, !tbaa !1
  ret void
}
-----

The main optimisation applied here is near the top of InstCombiner::visitFPTrunc,
which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x, y).
Because double precision floating point divides are accurate in OpenCL,
the single precision divide in the optimised code must also be
accurate, unlike a "direct" single precision divide.

I would imagine that creating a pinned metadata name for fpaccuracy, as
we currently do for dbg, tbaa and prof, would go some way towards addressing
the efficiency problem.

Thanks,
-- 
Peter