[PATCH] D75179: [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 27 06:36:59 PST 2020
foad marked an inline comment as done.
foad added inline comments.
================
Comment at: llvm/lib/Analysis/ConstantFolding.cpp:1797
+ if (IntrinsicID == Intrinsic::amdgcn_fract) {
+ // TODO what should amdgcn_fract return for tiny negative arguments?
+ // GLSL defines fract(x) as x - floor(x).
----------------
foad wrote:
> foad wrote:
> > arsenm wrote:
> > > foad wrote:
> > > > arsenm wrote:
> > > > > This should match the instruction behavior (although I guess we can ignore the bug on SI)
> > > > Is there a good public reference for that? The Vega ISA Reference Guide doesn't go into much detail.
> > > This is always a problem, and no. I just go by this comment:
> > >
> > >
> > >
> > > ```
> > >
> > > // V_FRACT is buggy on SI, so the F32 version is never used and (x-floor(x)) is
> > > // used instead. However, SI doesn't have V_FLOOR_F64, so the most efficient
> > > // way to implement it is using V_FRACT_F64.
> > > // The workaround for the V_FRACT bug is:
> > > // fract(x) = isnan(x) ? x : min(V_FRACT(x), 0.99999999999999999)
> > >
> > > // Convert floor(x) to (x - fract(x))
> > > ```
> > >
> > >
> > OK, so it sounds like the (non-buggy) hardware uses the same trick as the OpenCL definition, to avoid ever returning 1.0. I'll try to confirm that on some real hardware.
> I've confirmed this for f16 and f32 types, on some real gfx9 hardware.
... and confirmed for f64 too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75179/new/
https://reviews.llvm.org/D75179
More information about the llvm-commits
mailing list