[PATCH] D81340: [ARM] Split FPExt loads

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 9 00:31:01 PDT 2020


dmgreen added a comment.

In D81340#2081238 <https://reviews.llvm.org/D81340#2081238>, @efriedma wrote:

> You prefer to do more loads, as opposed to doing a single load and shuffling the result?  Are loads really that cheap, or is the alternative just too terrible?


Yeah. both actually. Loads are expected to be cheap (you can usually do a load with no stalls into a following mve instruction) and the alternative is to need to shuffle every lane into and out of registers.

MVE was designed with this "beats" system in mind, where 32bit chunks of the vector can architecturally overlap. Any instructions that cross beats are deemed to be expensive, and many you would expect just don't exist. So there is nothing that takes the bottom 4 lanes of a v8i16 and extends them into a v4i32. All the extends are done as values are loaded, or are done with t/b instructions like vmovlt/b.

Just adding up instructions will get a rough indication of cost. In some places there will be more depending on the CPU, but the M in MVE stands for M-Profile (err, I think) so in many ways it's fairly simple.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81340/new/

https://reviews.llvm.org/D81340





More information about the llvm-commits mailing list