[llvm-dev] [RFC] Making GVN able to visit the same block more than once

Wed Apr 18 08:50:55 PDT 2018

Introduction
============

I'm currently in the middle of what initially looked to be a simple change in
GVN but is turning out to have unexpected consequences that are turning out to
be quite difficult to resolve, so I thought I'd send out an RFC to make sure
that I'm not barking up the wrong tree with how I'm trying to do this.

Motivation and current behaviour
================================

The motiviating example here for what I'm trying to do is std::min_element, or
rather std::min_element after it's inlined into a function where the result is
dereferenced. If we have

  fload min_element_example(float *first, float *last) {
    return *std::min_element(first, last);
  }

then after inlining we have something that looks like

  float min_element(float *first, float *last)
  {
    for (float *p = first; p != last; ++p)
    {
      if (*p < *first)
        first = p;
    }
    return *first;
  }

What I want to happen here is eliminate those loads of *first and get something
that looks like

  float min_element_optimized(float *first, float *last)
  {
    float best = *first;
    for (float *p = first; p != last; ++p)
    {
      float pval = *p;
      if (pval < best)
        best = pval;
    }
    return best;
  }

This looks like partial redundancy elimination to me: if we look at the *first
in the loop then its value is one of:
 * The value of *first from the entry if this is the first iteration.
 * The value of *p from the previous iteration if the 'if' was true.
 * The value of *first from the previous iteration if the 'if' was false.

We have two available values (those in the loop) and one unavailable values
(the one from the entry block) so the value is partially redundant.

There are two problems here preventing GVN PRE from doing anything:
 * The 'if' becomes a select and GVN and MemoryDependenceAnalysis can't cope
   with selects. This isn't too hard I think, and I have a prototype for this
   which essentially treats them as kind of like a phi and that mostly works.
 * We need to look at the same block more than once, and have more than one
   value available in a block, and this completely breaks things. This is the
   part that this RFC is about.

Initial attempt
===============

If we turn off select generation (so we can ignore the first of the two
problems) then the IR we have coming into GVN looks like this:

  define float @min_element_phi(float* %first, float* %last) {
  entry:
    br label %for.body

  for.body:
    %p = phi float* [ %first, %entry ], [ %incdec.ptr, %for.inc ]
    %first.addr.1 = phi float* [ %first, %entry ], [ %first.addr.2, %for.inc ]
    %pval = load float, float* %p, align 4
    %firstval = load float, float* %first.addr.1, align 4
    %cmp1 = fcmp olt float %pval, %firstval
    br i1 %cmp1, label %if.then, label %for.inc

  if.then:
    br label %for.inc

  for.inc:
    %first.addr.2 = phi float* [ %p, %if.then ], [ %first.addr.1, %for.body ]
    %incdec.ptr = getelementptr inbounds float, float* %p, i64 1
    %cmp = icmp eq float* %incdec.ptr, %last
    br i1 %cmp, label %exit, label %for.body

  exit:
    %ret = load float, float* %first.addr.2, align 4
    ret float %ret
  }

GVN calls MemoryDependenceAnalysis::getNonLocalPointerDependency to find the
non-local dependencies of the load %ret. What it then does is:
 * Starts looking for an available value of *%first.addr.2 starting in exit.
 * Doesn't find anything in exit, goes to for.inc.
 * Doesn't find anything in for.inc, phi translates to be looking for a value of
   *%p in if.then.
 * Doesn't find anything in if.then, goes to for.body.
 * Finds the value of pval in for.body.
 * Goes back to for.inc and phi-translates to be looking for the value of
   *%first.addr.1 in for.body.
 * Gives up because we're already looked at for.body for %p.

The thing that keeps track of which blocks we've already looked at is
DenseMap<BasicBlock *, Value *> Visited, which is a map of a block to the value
that we looked at that block for. The way the logic works is that is we try to
visit a block that's already been visited:
 * If we previously visited it for the same value that we're now visiting it for
   then we can safely ignore the block.
 * If we're visiting it for a different value then we can't handle it and give
   up.

The approach I tried here is:
 * Turn Visited into a set of <BasicBlock*,Value*> pairs and keep the behaviour
   of ignoring blocks that we've already visited for a value but allow visiting
   a block more than once if it's for different values.
 * Adjust NonLocalDepResult to mark a result as originating in the block we
   started looking in (after phi-translation), not the one we eventually found
   it in.

With this GVN then goes on to optimise this function exactly as we want. However
we then get problems elsewhere.

The problem
============

Let's take a look at this function:

  define i32 @multiple_path_test(i32* %ptr1, i32* %ptr2, i32* %ptr3) {
  entry:
    %val1 = load i32, i32* %ptr1, align 4
    %val2 = load i32, i32* %ptr2, align 4
    %val3 = load i32, i32* %ptr3, align 4
    %cmp1 = icmp slt i32 %val1, %val2
    br i1 %cmp1, label %a, label %b

  a:
    %cmp2 = icmp slt i32 %val1, %val3
    br i1 %cmp2, label %end, label %b

  b:
    %bphi = phi i32* [ %ptr2, %a ], [ %ptr3, %entry ]
    br label %end

  end:
    %retptr = phi i32* [ %ptr1, %a ], [ %bphi, %b ]
    %ret = load i32, i32* %retptr, align 4
    ret i32 %ret
  }

Here this load of %ret is fully redundant: on every path from entry to end there
is a load which already has the value that %ret would load.

Currently in GVN after it does AnalyzeLoadAvailability what we have is:
 * %val1 is available in %entry (for the entry->a->end path)
 * %bphi is unknown in %b (we tried to visit %entry for %ptr3 but failed because
   we already visited it for %ptr1)

What then happens is:
 * We have one available and one unavailable value, so PRE is done
 * This causes a load of %bphi to be inserted in %b
 * We then analyze the load of %bphi in %b
 * This turns out to be fully redundant and is turned into a phi of %val2 and
   %val3.

With the above change to allow visiting the same block more than once then what
we get is is:
 * %val1 is available in %a (for the a->end edge)
 * %val2 is available in %a (for the a->b edge)
 * %val3 is available in %entry (for the entry->b edge)

What then happens is:
 * There are no unavailable values, so full redundancy elimination is done
 * ConstructSSAForLoadSet is called to do PHI construction
 * This then uses SSAUpdater
 * %val2 is set as the value for %a
 * %val1 is ignored as we already have a value for %a
 * %val3 is set as the value for %entry
 * Phi construction is done which completely ignores %val1

The problem here is that SSAUpdater follows the paradigm of one block = one
available value, where if a value is available in a block then all successors to
that block will use that value. This is not valid here, as for %a which value
you get depends on which successor you go to.

Currently I'm in the middle of modifying SSAUpdater and AvailableValueInBlock to
be able to understand that which value you get can depend on both the source
and destination block, and make it use that information in PHI construction.

Questions
=========

Some questions I have:
 * Is GVN actually the right place to do this transformation, or is there some
   other better place that I've missed?
 * I'm pretty sure I do need to do this "visit the same block more than once"
   thing, especially for select as the values on both sides of the select can
   be in the same block that the select is in so we have to visit the same
   block three times, but perhaps there's some other way to handle this?
 * Maybe instead of handling needing different values on different edges instead
   I should be making AnalyzeLoadAvailability (or something in
   MemoryDependenceAnalysis) recognise that this is happening and insert an
   'unknown' into the appropriate place (%b in the above example) so that we get
   the same behaviour that we currently have?

John