[PATCH] [AArch64] Improve codegen of store lane instructions by avoiding GPR usage

James Molloy james.molloy at arm.com
Tue Dec 23 02:35:31 PST 2014


Hi Ahmed,

This patch would be better split into two. The first one seems fine to me.

For the second one; my local tinkering has suggested that you shouldn't need the "Non0" stuff at all.

You want to raise ST1Lane's AddedComplexity from 15 to 19, because the integer STRWro also has AddedComplexity 15 (and it matches more nodes in the DAG so has an overall higher complexity). This is fine, although it does expose AddedComplexity to be a very blunt hammer.

But if you do that you end up with suboptimal code for lane 0:

  add	x8, x0, w1, sxtw #2
  st1	{ v0.s }[0], [x8]

(which is different from the code snippet you pasted earlier. I get the above solely from changing the AddedComplexity from 15 to 19 on the ST1Lane patterns)

So surely all you need to do is make the AddedComplexity of your new Lane0 patterns "19". This is because your new patterns will match at least one more node than the ST1 patterns (because they match a RO addr mode).

This way, you're just telling SDAG "Here is a new pattern, it has the same AddedComplexity as the other one - choose at your will". Which is better than telling it it can never pick an ST1Lane for an index 0.

Does that sound sensible?

Cheers,

James


http://reviews.llvm.org/D6202

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list