[PATCH] D158487: [PowerPC][altivec] Optimize codegen of vec_promote

Wed Aug 23 14:04:18 PDT 2023

nemanjai accepted this revision.
nemanjai added a comment.
This revision is now accepted and ready to land.

LGTM. This is a good idea and we should go ahead with this for anyone that uses `vec_promote`, but it might be a good idea to improve codegen for the insert which might be more common.

================
Comment at: llvm/test/CodeGen/PowerPC/vec-promote.ll:43
+
+define noundef <4 x float> @vec_promote_float_zeroed(ptr nocapture noundef readonly %p) {
+; CHECK-BE-LABEL: vec_promote_float_zeroed:
----------------
This code is absolutely terrible. Not only is the `lfs` super slow compared to `lfiwzx/lxsiwzx` that we actually want, but the two conversions and three permutes are super slow.

I think the change to `altivec.h` to produce better code for something like that is a good thing, but I wonder if something like this might come up in other contexts.

At least on Power9 and up, we can do much better than this. We don't do particularly well regardless of whether we're using a zero vector input or an arbitrary vector: https://godbolt.org/z/79fx8nsdP

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158487/new/

https://reviews.llvm.org/D158487