[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
Villmow, Micah
Micah.Villmow at amd.com
Thu Sep 8 10:44:02 PDT 2011
> -----Original Message-----
> From: Peter Collingbourne [mailto:peter at pcc.me.uk]
> Sent: Thursday, September 08, 2011 10:28 AM
> To: Villmow, Micah
> Cc: Robert Quill; anton.lokhmotov at arm.com; cfe-dev at cs.uiuc.edu;
> llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [cfe-dev] Proposal: floating point accuracy
> metadata (OpenCL related)
>
> On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote:
> > Peter,
> > Is there a way to make this flag globally available? Metadata can be
> fairly expensive to handle at each node when in many cases it is a
> global flag and not a per operation flag.
>
> There are two main reasons why I think we shouldn't go for global
> flags:
>
> 1) It becomes difficult if not impossible to correctly link together
> modules with different accuracy requirements, especially if LTO
> is done on the combined module.
>
> 2) Some LLVM optimisations will create operations with a accuracy
> requirement different from the language specified accuracy.
> For example, consider the following OpenCL kernel:
>
> -----
> #pragma OPENCL EXTENSION cl_khr_fp64: enable
>
> __kernel void dpdiv(__global float *result, float x, float y) {
> *result = (double) x / (double) y;
> }
> -----
>
> When compiled to LLVM with optimisations turned off, the function
> looks like this:
>
> -----
> define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable
> {
> entry:
> %result.addr = alloca float*, align 8
> %x.addr = alloca float, align 4
> %y.addr = alloca float, align 4
> store float* %result, float** %result.addr, align 8
> store float %x, float* %x.addr, align 4
> store float %y, float* %y.addr, align 4
> %tmp = load float* %x.addr, align 4
> %conv = fpext float %tmp to double
> %tmp1 = load float* %y.addr, align 4
> %conv2 = fpext float %tmp1 to double
> %div = fdiv double %conv, %conv2
> %conv3 = fptrunc double %div to float
> %tmp4 = load float** %result.addr, align 8
> store float %conv3, float* %tmp4
> ret void
> }
> -----
>
> With optimisations turned on:
>
> -----
> define void @dpdiv(float* nocapture %result, float %x, float %y)
> nounwind uwtable {
> entry:
> %conv3 = fdiv float %x, %y
> store float %conv3, float* %result, align 4, !tbaa !1
> ret void
> }
> -----
>
> The main optimisation applied here is near the top of
> InstCombiner::visitFPTrunc,
> which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x,
> y).
> Because double precision floating point divides are accurate in OpenCL,
> the single precision divide in the optimised code must also be
> accurate, unlike a "direct" single precision divide.
>
> I would imagine that creating a pinned metadata name for fpaccuracy, as
> we currently do for dbg, tbaa and prof, would go some way towards
> addressing
> the efficiency problem.
[Villmow, Micah] Yeah, that could work also.
>
> Thanks,
> --
> Peter
More information about the llvm-dev
mailing list