[PATCH] D146212: [AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask.

Lawrence Benson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 21 06:12:50 PDT 2023


lawben added a comment.

@dmgreen Thanks for the feedback. I've addressed the comments you made. I've renamed the method, as it does not require an `AND` anymore.

re: update to base against main. I tried to merge the current main branch into another patch (using git) and completely broke my setup, i.e., `arc` is doing really weird things, trying to lint hundreds of files, etc. So I'm not sure what the correct way is to do this. Do you maybe have a pointer for me? This is my first patch, so I'm new to this setup.



================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10218
+  // directly to TBL1.
+  if (VT.getSimpleVT() != MVT::v16i8 && VT.getSimpleVT() != MVT::v8i8)
     return SDValue();
----------------
dmgreen wrote:
> This can test against the EVT directly: `VT != MVT::v16i8 && .. `.
> Other types could be handled too, by breaking the larger type up into i8 lanes. i.e a i32 would add `[idx*4,idx*4+1,idx*4+2,idx*4+3]` for example, if you get my meaning.
> I'm not sure if that is worth adding to this patch, or handling in another but it would be good to keep in mind.
I also had this in mind but was thinking about putting it in another patch. `v16i8` and `v8i8` directly map, so they are easier to generate. Other sizes have some more overhead, so it makes sense to split them up. I've seen the logic for this in X86 instruction selection, so I guess that could just be reused. I'll do that in a follow-up, once this is merged.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10258
 
     if (MaskSource.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
       return SDValue();
----------------
dmgreen wrote:
> Should something check that the inner extract element (MaskSource) has an index equal to i?
Yes, good catch. I've added a check and a test.


================
Comment at: llvm/test/CodeGen/AArch64/neon-shuffle-vector-tbl.ll:12
 
   %masked_mask = and <16 x i8> %mask, <i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15>
   %1 = extractelement <16 x i8> %masked_mask, i64 0
----------------
dmgreen wrote:
> It would be good to have a test case without this And mask, just using %mask directly.
Makes sense. There is actually no reason to limit this to and-masked masks. So I added a path that either expects all elements to have an `AND` mask or none. If there are no `AND`s, we can omit it and directly emit a `tbl`. I was looking at this from Clang's `__builtin_shufflevector()` (as this is my use) and that always adds an `AND`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146212/new/

https://reviews.llvm.org/D146212



More information about the llvm-commits mailing list