[PATCH] MachineCSE: Add a target query for the LookAheadLimit heurisitic

Mon May 4 15:11:42 PDT 2015

In http://reviews.llvm.org/D9472#165495, @tstellarAMD wrote:

> In http://reviews.llvm.org/D9472#165425, @MatzeB wrote:
>
> > Hi Tom,
> >
> > I don't know the history here, but as this does scan forward for each instruction of the basic block it looks like a way to avoid quadratic runtime behavior for (corner) cases with thousands of instructions in a basic block. I think it is no problem to go to a much higher limit than 5. But why go completely boundless, do you need a guarantee here that the CSE is happening?
>
>
> Yes, I would like a guarantee that CSE is happening.  For AMD GPUs, there is a control register (m0), which is used to clamp memory addresses to avoid out of bound reads and writes.  Before each load/store instruction, we emit: s_mov_b32 m0, -1  (-1 disables address clamping) and then rely on MachineCSE to eliminate all the unnecessary moves.

I assume SelectionDAG CSE isn't enough for you because it does not cover multiple blocks. I'm not happy about the quadratic runtime resulting from an unbounded search, but as this is a target specific option, the patch LGTM.

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D9472

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/