[PATCH] D57044: [AArch64] Optimize Inf materialization

Wed Jan 23 12:25:44 PST 2019

efriedma added a comment.

It would be a one-line change to make the AArch64 target treat all FP immediates as legal (this was implemented in https://reviews.llvm.org/rL223941 , but only enabled for the large code model).

In terms of what's actually profitable, it's not entirely obvious.  If you look at timings on a Cortex-A57, the cost is actually exactly the same for mov+fmov vs. adrp+ldr; the mov+fmov sequence is always better because of the reduced cache pressure.  And actually, the timings are still the same if you consider movw+movk+fmov vs. adrp+ldr (it's one instruction longer, but the movw+movk is fused). For f64, though, it's not obviously worthwhile to emit a five-instruction sequence vs. a two instruction constant-pool load.  Some quick testing with gcc shows it emits mov+movk, but not longer sequences.

We could add a separate target hook for DAGCombiner::SimplifySelectCC, maybe, to distinguish between illegal immediates and immediates which are legal-but-expensive.  Probably worth revising that code anyway; it currently doesn't consider the possibility of transforming to an integer SELECT_CC, then BITCASTing the result to a float.

We could also add code to use the SIMD MOVI for floating-point immediates in certain cases (such immediates are probably not that common, but you can make some interesting numbers like 32.f and -0.f).

In any case, it doesn't really make sense to special-case infinity, specifically.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57044/new/

https://reviews.llvm.org/D57044