[Patch][ARM] Fix and enable the Load/Store optimisation pass for Thumb1

James Molloy james.molloy at arm.com
Wed May 14 04:50:43 PDT 2014


Hi Moritz,

Thanks for working on this! Lack of LDM/STM on v6m is pretty bad for
performance across the board, not just on Dhrystone.

For the list's benefit, as this is Moritz's first LLVM patch I reviewed this
internally to the point of being happy with it. I personally think it's fine
to be committed, although on a second look I will no doubt find nits that I
missed first time round.

Would anyone (Renato? Tim?) like to review this further or is my LGTM
enough?

Cheers,

James

> -----Original Message-----
> From: Moritz Roth
> Sent: 14 May 2014 11:33
> To: llvm-commits at cs.uiuc.edu
> Cc: James Molloy
> Subject: [Patch][ARM] Fix and enable the Load/Store optimisation pass for
> Thumb1
> 
> Hi all,
> 
> this is a set of patches to add support for Thumb1 targets in the
> Load/Store optimisation pass, and re-enable that pass as well as inline
> memcpy expansion.
> Below is a short description of each patch:
> 
> 0001 - This patch fixes a few comment typos and other style issues I
> addressed while working on this. It's fairly small, and there is no
> intended functionality change.
> 
> 0002 - This patch re-enables the Load/Store optimisation pass for
> Thumb1-only targets. Since the actual change to the algorithm isn't in
> this patch yet, the pass simply returns and does nothing if invoked for
> such a target. Essentially, the place where the pass is disabled for
> Thumb1 is just moved down into the actual pass, so patch 0003 can easily
> make it *actually* do something. Again, there is no intended
> functionality change.
> 
> 0003 - This is the main patch - it adds support in the Load/Store
> optimisation pass to correctly generate Thumb1 LDMIA/STMIA instructions
> and fully enables the pass.
> The reason this was disabled before is that the current algorithm always
> generates non-writeback Load/Store multiples first, and then tries to
> merge any applicable base register updates into the LDM/STM. Thumb1 only
> has LDM/STM with base register writeback, so this approach doesn't
> really work there. In a nutshell, my patch directly generates the Thumb1
> tLDMIA[_UPD] and tSTMIA_UPD instructions. It then scans over the current
> block and tries to update any future instructions that read the base
> register with the new offset added from the writeback. If this isn't
> possible, the base register is reset right before the next instruction
> that uses it. The later (base-writeback merge) stages of the pass aren't
> applicable to Thumb1, so they're not executed.
> 
> This is a rather large patch and there are many details I've left out
> here. I'll put a more detailed description of the changes on Phabricator
> for review shortly.
> There is no intended functionality change for non-Thumb1 targets. I've
> added some tests to check that the pass is working - but note that there
> is another set of test cases for this (and memcpy expansion) in patch
> 0004. There's also a fix for a failing test where two instructions were
> being merged by the algorithm.
> 
> 0004 - This patch re-enables inline memcpy expansion for Thumb1. It was
> disabled for Thumb1 since the Load/Store optimisation pass was disabled.
> There are also test cases to make sure that small memcpys are inlined,
> and that the resulting chains of LDR/STR are merged correctly into
> LDM/STM (see patch 0003). This patch should only be applied once 0003 is
> commited.
> 
> Finally, regarding code size / performance impact: This patch has an
> impact on certain benchmarks that do lots of memcpy. By itself, it seems
> to give a ~7% improvement in Dhrystone. Together with some trickery to
> make clang align global strings at word boundaries (this allows a
> further memcpy to be inlined), there's a ~25% overall speed-up.
> 
> Cheers
> Moritz
> 
> PS: Sorry for the disclaimer, still working on getting that removed from
> my work email account.







More information about the llvm-commits mailing list