[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos
Kristof Beyls via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 7 06:57:25 PST 2017
kristof.beyls added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:71-73
+ // This is used to cache instruction replacement decisions within function
+ // units and across function units.
+ std::map<unsigned, bool> SIMDInstrTable;
----------------
az wrote:
> kristof.beyls wrote:
> > The cache here will also need to take into account the TargetSubtargetInfo as the key into the cache.
> > Otherwise, wouldn't you get unexpected results for e.g. the following .ll file with a mixture of targeting different subtargets (see "target-cpu" function attributes)?
> >
> > ```
> > declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>)
> >
> > define <2 x float> @test_vfma_lane_f32_a57(<2 x float> %a, <2 x float> %b, <2 x float> %v)
> > "target-cpu"="cortex-a57" {
> > entry:
> > %lane = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
> > %0 = tail call <2 x float> @llvm.fma.v2f32(<2 x float> %lane, <2 x float> %b, <2 x float> %a)
> > ret <2 x float> %0
> > }
> >
> > define <2 x float> @test_vfma_lane_f32_exynos(<2 x float> %a, <2 x float> %b, <2 x float> %v)
> > "target-cpu"="exynos-m1" {
> > entry:
> > %lane = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
> > %0 = tail call <2 x float> @llvm.fma.v2f32(<2 x float> %lane, <2 x float> %b, <2 x float> %a)
> > ret <2 x float> %0
> > }
> > ```
> >
> > I have no idea how to get a stable key from TargetSubtargetInfo to use in this cache.
> >
> > At this point, I'd measure the overhead of this pass e.g. using CTMark to make sure that the caching mechanism is actually needed before investing more time in it.
> >
> I am adding subtarget as part of key.
Thanks! Could you also add a regression test similar to the LLVM-IR I wrote in my original comment about this to make sure this functionality is tested?
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:163-170
+bool AArch64SIMDInstrOpt::
+shouldReplaceInst(MachineFunction *MF, const MCInstrDesc *InstDesc,
+ SmallVectorImpl<const MCInstrDesc*> &InstDescRepl) {
+ // Check if replacement decision is already available in the cached table.
// if so, return it.
- if (!VecInstElemTable.empty() &&
- VecInstElemTable.find(InstDesc->getOpcode()) != VecInstElemTable.end())
- return VecInstElemTable[InstDesc->getOpcode()];
+ if (!SIMDInstrTable.empty() &&
+ SIMDInstrTable.find(InstDesc->getOpcode()) != SIMDInstrTable.end())
----------------
az wrote:
> kristof.beyls wrote:
> > Is the only reason you need to drop const from shouldReplaceInst and other methods that you're using operator[] here to get to an element that's guaranteed to be in the map?
> > If so, wouldn't it be better to use syntax "*SIMDInstrTable.find(InstDesc->getOpcode())", so that the const can remain on all those methods?
> Yes, I was having trouble setting SIMDInstrTable[...] to true or false (few lines below the line of code you mention above). It thinks that the map member variable I added to the const class has the constant modifier too.
Thanks, that makes sense to me now.
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:333-353
+ return false;
+ ReplInstrMCID.clear();
+ OriginalMCID = &TII->get(AArch64::ST4Fourv16b);
+ for (unsigned i=0; i<4; i++) {
+ ReplInstrMCID.push_back(&TII->get(AArch64::ZIP1v16i8));
+ ReplInstrMCID.push_back(&TII->get(AArch64::ZIP2v16i8));
+ }
----------------
az wrote:
> kristof.beyls wrote:
> > From a logical point of view, this is a lot of code that repeats information encoded in a different way in the big switch statements in optimizeLdStInterleave.
> > It'll be easy to update one location in the future and forget the other location.
> > Couldn't the logic in the switch statements in optimizeLdStInterleave be reused here somehow, so that the information is only encoded once?
> Good suggestion to simplify that code. There are several ways to improve it and here is one way: store the re-writing rules in a table member variable of the class and use that table in shouldExitEarly and in optimizeLdStInterleave. Note that I have not changed optimizeLdStInterleave yet (only used new table in shouldExitEarly). I would like to make sure that what I did is the kind of changes you had originally in mind and if so, then I will use it in optimizeLdStInterleave.
>
Yes, this is the kind of change I had in mind.
I didn't try to figure out if everything could be driven from such a table, but what you wrote in shouldExitEarly is roughly what I had in mind (with a follow-on nit-pick about making the code even more data-driven in my other comment on the new code).
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:267-278
+ for (unsigned i=0; i<MaxInstrKind; i++)
+ {
+ OriginalMCID = &TII->get(IRI[i].OrigOpc);
+ if (IRI[i].AccT == TwoVectors) {
+ ReplInstrMCID.push_back(&TII->get(IRI[i].ReplOpc[0]));
+ ReplInstrMCID.push_back(&TII->get(IRI[i].ReplOpc[1]));
+ ReplInstrMCID.push_back(&TII->get(IRI[i].ReplOpc[2]));
----------------
I like the table-driven approach a lot better - the code is a lot cleaner, easier to read and easier to maintain.
However, wouldn't it be even cleaner to not let the code behave special based on TwoVectors/FourVectors?
I guess this would mean spelling out the whole sequence of replacement instructions in the table, instead of using the 'for (unsigned j=0; j<4, j++)' loop here, and it would make the table larger?
I still guess that may be the right tradeoff to make, as it will make it easier to understand what the patch does from just reading the table.
And I guess it will also make it easier to later add support for other replacement patterns?
https://reviews.llvm.org/D38196
More information about the llvm-commits
mailing list