[PATCH] MachineCSE: Add a target query for the LookAheadLimit heurisitic
Matthias Braun
matze at braunis.de
Mon May 4 15:11:42 PDT 2015
In http://reviews.llvm.org/D9472#165495, @tstellarAMD wrote:
> In http://reviews.llvm.org/D9472#165425, @MatzeB wrote:
>
> > Hi Tom,
> >
> > I don't know the history here, but as this does scan forward for each instruction of the basic block it looks like a way to avoid quadratic runtime behavior for (corner) cases with thousands of instructions in a basic block. I think it is no problem to go to a much higher limit than 5. But why go completely boundless, do you need a guarantee here that the CSE is happening?
>
>
> Yes, I would like a guarantee that CSE is happening. For AMD GPUs, there is a control register (m0), which is used to clamp memory addresses to avoid out of bound reads and writes. Before each load/store instruction, we emit: s_mov_b32 m0, -1 (-1 disables address clamping) and then rely on MachineCSE to eliminate all the unnecessary moves.
I assume SelectionDAG CSE isn't enough for you because it does not cover multiple blocks. I'm not happy about the quadratic runtime resulting from an unbounded search, but as this is a target specific option, the patch LGTM.
REPOSITORY
rL LLVM
http://reviews.llvm.org/D9472
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list