[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Thu Sep 8 10:44:02 PDT 2011


> -----Original Message-----
> From: Peter Collingbourne [mailto:peter at pcc.me.uk]
> Sent: Thursday, September 08, 2011 10:28 AM
> To: Villmow, Micah
> Cc: Robert Quill; anton.lokhmotov at arm.com; cfe-dev at cs.uiuc.edu;
> llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [cfe-dev] Proposal: floating point accuracy
> metadata (OpenCL related)
> 
> On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote:
> > Peter,
> >  Is there a way to make this flag globally available? Metadata can be
> fairly expensive to handle at each node when in many cases it is a
> global flag and not a per operation flag.
> 
> There are two main reasons why I think we shouldn't go for global
> flags:
> 
> 1) It becomes difficult if not impossible to correctly link together
>    modules with different accuracy requirements, especially if LTO
>    is done on the combined module.
> 
> 2) Some LLVM optimisations will create operations with a accuracy
>    requirement different from the language specified accuracy.
>    For example, consider the following OpenCL kernel:
> 
> -----
> #pragma OPENCL EXTENSION cl_khr_fp64: enable
> 
> __kernel void dpdiv(__global float *result, float x, float y) {
>   *result = (double) x / (double) y;
> }
> -----
> 
> When compiled to LLVM with optimisations turned off, the function
> looks like this:
> 
> -----
> define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable
> {
> entry:
>   %result.addr = alloca float*, align 8
>   %x.addr = alloca float, align 4
>   %y.addr = alloca float, align 4
>   store float* %result, float** %result.addr, align 8
>   store float %x, float* %x.addr, align 4
>   store float %y, float* %y.addr, align 4
>   %tmp = load float* %x.addr, align 4
>   %conv = fpext float %tmp to double
>   %tmp1 = load float* %y.addr, align 4
>   %conv2 = fpext float %tmp1 to double
>   %div = fdiv double %conv, %conv2
>   %conv3 = fptrunc double %div to float
>   %tmp4 = load float** %result.addr, align 8
>   store float %conv3, float* %tmp4
>   ret void
> }
> -----
> 
> With optimisations turned on:
> 
> -----
> define void @dpdiv(float* nocapture %result, float %x, float %y)
> nounwind uwtable {
> entry:
>   %conv3 = fdiv float %x, %y
>   store float %conv3, float* %result, align 4, !tbaa !1
>   ret void
> }
> -----
> 
> The main optimisation applied here is near the top of
> InstCombiner::visitFPTrunc,
> which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x,
> y).
> Because double precision floating point divides are accurate in OpenCL,
> the single precision divide in the optimised code must also be
> accurate, unlike a "direct" single precision divide.
> 
> I would imagine that creating a pinned metadata name for fpaccuracy, as
> we currently do for dbg, tbaa and prof, would go some way towards
> addressing
> the efficiency problem.
[Villmow, Micah] Yeah, that could work also.
> 
> Thanks,
> --
> Peter