[PATCH] D86819: [PowerPC][Power10] Implementation of 128-bit Binary Vector Rotate builtins

Thu Sep 10 07:48:24 PDT 2020

nemanjai added inline comments.

================
Comment at: clang/lib/Headers/altivec.h:7743
+  return __builtin_altivec_vrlqnm(__a, ((__c << ShiftMask) |
+                                        (__b << ShiftRotation)));
+}
----------------
While correct, this implementation will require two constant pool loads (for the two shift amounts), then two `vrlq`'s to shift the two vectors and finally an `xxlor` to OR them together. We should be able to do this with a single constant pool load and `vperm`.
Presumably the implementation would be something like:
```
  // Merge __b and __c using an appropriate shuffle.
  vector unsigned char TmpB = (vector unsigned char)__b;
  vector unsigned char TmpC = (vector unsigned char)__c;
  vector unsigned char MaskAndShift =
#ifdef __LITTLE_ENDIAN__
      __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, -1, -1, -1, 16, 1,
                              0, -1, -1, -1, -1, -1);
#else
      __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, 30, 31, 15, -1,
                              -1, -1, -1, -1, -1, -1, -1);
#endif
  return __builtin_altivec_vrlqnm(__a, MaskAndShift);
```
(but of course, double-check that the numbers are correct).

================
Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:996
+  // CHECK-COMMON-LABEL: @test_vec_rlnm_s128(
+  // CHECK-COMMON: call <1 x i128> @llvm.ppc.altivec.vrlqnm(<1 x i128>
+  // CHECK-COMMON-NEXT: ret <1 x i128>
----------------
Please show the shift in the test case as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86819/new/

https://reviews.llvm.org/D86819