[PATCH] D156350: [X86] Allow pre-SSE41 targets to extract multiple v16i8 elements coming from the same DWORD/WORD super-element

Mon Jul 31 02:41:26 PDT 2023

pengfei added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:20695
+  // TODO: Add QWORD MOVQ extraction?
+  if (VT.getSizeInBits() == 8) {
+    APInt DemandedElts = getExtractedDemandedElts(Vec.getNode());
----------------
Why we use `getSizeInBits` rather than check for `i8`?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:20713
     int WordIdx = IdxVal / 2;
-    SDValue Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16,
-                              DAG.getBitcast(MVT::v8i16, Vec),
-                              DAG.getIntPtrConstant(WordIdx, dl));
-    int ShiftVal = (IdxVal % 2) * 8;
-    if (ShiftVal != 0)
-      Res = DAG.getNode(ISD::SRL, dl, MVT::i16, Res,
-                        DAG.getConstant(ShiftVal, dl, MVT::i8));
-    return DAG.getNode(ISD::TRUNCATE, dl, VT, Res);
+    if (DemandedElts == (DemandedElts & (3 << (WordIdx * 2)))) {
+      SDValue Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16,
----------------
It's not clear to me here, the old code should have more chance to generate SRL than the new code due to the restriction. Which one it better? I didn't find a case to reflect the difference.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156350/new/

https://reviews.llvm.org/D156350