RFC: min/max/abs IR intrinsics
james at jamesmolloy.co.uk
Sat Apr 25 01:44:09 PDT 2015
I've also previously gone down the lines of a target-isel approach and had
a patch that pretty much worked. I'll explain why I now think this is
better done early in the IR. I should say though that I could be convinced
to go back to the target-isel approach if there are better reasons :)
1. Matching min/max patterns can be awkward in the presence of constants.
Consider the idiomatic max(x, 0) with a conversion after it:
%1 = fcmp ogt float 0.0, float %x
%2 = select i1 %1, float %x, float 0.0
%3 = fptoui float %2 to i32
InstCombine will push the fptoui up:
%1 = fcmp ogt float 0.0, float %x
%2 = fptoui float %x to i32
%3 = select i1 %1, i32 %2, i32 0
Now, you have a much more complex pattern to match. You have to identify
that "i32 0" is the same as "float 0.0" with "fptoui" applied to it. This
gets even more complex when you have nested mins and maxes, for example in
"clamp(x, 0, 255)". It was this that I was having difficulty efficiently
and reliably matching at the ISel level, and what convinced me to move
higher up the pipe where at least testing was easier.
Now actually, we can do even better in IR and move further still up the
pipe, and run before InstCombine. Then, we don't need any complex matching
at all and everything is simpler.
2. As I've mentioned, this approach allows anything using the cost model to
be more accurate. This currently is just the vectorizers but I see other
users coming along soon like loop distribution and loop fusion.
3. This simplifies pattern matching in backends with not just min/max
support, but also with explicit "clamp" support such as GPUs. GPUs will
often have (almost free) clamp_0_255 instructions or modifiers as this is
used extremely often in image filters and shaders.
4. Doing this at the IR level is immediately useful for all backends, not
just (one of) AArch64 or AArch32.
5. To make sure the clamp() idiom got matched, I upped the amount that
SimplifyCFG would speculate (in r229099). This caused some slight
controversy - it is still in-tree but there is still the potential of
pulling it out if it is found that it was too aggressive. The IR approach
means we don't actually need it and can pattern match across basic blocks,
which we can't do in ISel (and it'd be too late by then because all the
vectorizers will have bailed out).
As you've said Gerolf, we have patchy max/min/abs matching and handling in
many places in codegen; it would be nice if this approach could remove them
all. My only worry is that at the moment, if a target does not support the
new intrinsics they will be expanded, but will be expanded only after the
DAG is type-legalized and so may miss out on some scalar optimizations. I'm
not sure how well-founded this worry is though.
On Sat, 25 Apr 2015 at 04:19 Owen Anderson <resistor at mac.com> wrote:
> On Apr 24, 2015, at 2:19 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> I was wondering if there were a performance advantage to have a higher
> level representation of min/max. My hunch is that recognizing the idiom at
> the low-level is ‘good enough’.
> I suspect (aside from vectorization) the impact to be minimal for integer
> min/max. For floating point, I expect there to be greater upside because
> of ISA level support, and because the SW emulation expansion is typically
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits