[PATCH] D81343: [AArch64] custom lowering for i128 popcount
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jun 7 20:48:39 PDT 2020
efriedma added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/popcount.ll:10
+; CHECK-NEXT: add x8, x0, #8 // =8
+; CHECK-NEXT: ld1 { v0.d }[1], [x8]
+; CHECK-NEXT: cnt v0.16b, v0.16b
----------------
shawnl wrote:
> shawnl wrote:
> > shawnl wrote:
> > > efriedma wrote:
> > > > Why are we generating two loads here? Something related to the BITCAST legalization?
> > > Yes, it should be:
> > >
> > > > ldr q0, [x0]
> > >
> > >
> > Yes, it is going to AArch64ISelLowering.cpp:14006
> >
> >
> > ```
> > case ISD::LOAD: {
> > assert(SDValue(N, 0).getValueType() == MVT::i128 &&
> > "unexpected load's value type");
> > LoadSDNode *LoadNode = cast<LoadSDNode>(N);
> > if (!LoadNode->isVolatile() || LoadNode->getMemoryVT() != MVT::i128) {
> > // Non-volatile loads are optimized later in AArch64's load/store
> > // optimizer. // <======This is not happening
> > return;
> > }
> >
> > SDValue Result = DAG.getMemIntrinsicNode(
> > AArch64ISD::LDP, SDLoc(N),
> > DAG.getVTList({MVT::i64, MVT::i64, MVT::Other}),
> > {LoadNode->getChain(), LoadNode->getBasePtr()}, LoadNode->getMemoryVT(),
> > LoadNode->getMemOperand());
> >
> > SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, SDLoc(N), MVT::i128,
> > Result.getValue(0), Result.getValue(1));
> > Results.append({Pair, Result.getValue(2) /* Chain */});
> > return;
> > }
> > ```
> With `-O0` it outputs:
>
> ```
> ldr x8, [x0, #8]
> ldr d0, [x0]
> // implicit-def: $q1
> mov v1.16b, v0.16b
> mov v1.d[1], x8
> ```
Hmm. Really, I think the problem reduces to something like the following, which generates a similar ldr+add+ld1 sequence:
define <2 x i64> @z(i64* nocapture nonnull readonly %p) {
%b = load i64, i64* %p
%p2 = getelementptr i64, i64* %p, i64 1
%bb = load i64, i64* %p2
%r1 = insertelement <2 x i64> zeroinitializer, i64 %b, i32 0
%r2 = insertelement <2 x i64> %r1, i64 %bb, i32 1
ret <2 x i64> %r2
}
X86ISelLowering.cpp has some code specifically to handle this; see EltsFromConsecutiveLoads. Maybe some of it should be ported to AArch64.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D81343/new/
https://reviews.llvm.org/D81343
More information about the llvm-commits
mailing list