[PATCH] D57044: [AArch64] Optimize Inf materialization
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 23 12:25:44 PST 2019
efriedma added a comment.
It would be a one-line change to make the AArch64 target treat all FP immediates as legal (this was implemented in https://reviews.llvm.org/rL223941 , but only enabled for the large code model).
In terms of what's actually profitable, it's not entirely obvious. If you look at timings on a Cortex-A57, the cost is actually exactly the same for mov+fmov vs. adrp+ldr; the mov+fmov sequence is always better because of the reduced cache pressure. And actually, the timings are still the same if you consider movw+movk+fmov vs. adrp+ldr (it's one instruction longer, but the movw+movk is fused). For f64, though, it's not obviously worthwhile to emit a five-instruction sequence vs. a two instruction constant-pool load. Some quick testing with gcc shows it emits mov+movk, but not longer sequences.
We could add a separate target hook for DAGCombiner::SimplifySelectCC, maybe, to distinguish between illegal immediates and immediates which are legal-but-expensive. Probably worth revising that code anyway; it currently doesn't consider the possibility of transforming to an integer SELECT_CC, then BITCASTing the result to a float.
We could also add code to use the SIMD MOVI for floating-point immediates in certain cases (such immediates are probably not that common, but you can make some interesting numbers like 32.f and -0.f).
In any case, it doesn't really make sense to special-case infinity, specifically.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57044/new/
https://reviews.llvm.org/D57044
More information about the llvm-commits
mailing list