[LLVMdev] Suspicious behavior of mem2reg (promoteSingleBlockAlloca)

Jeehoon Kang jeehoon.kang at sf.snu.ac.kr
Fri Jul 17 00:16:47 PDT 2015


Hi LLVMDev,
this is Jeehoon Kang, a Ph.D. student of Software Foundations Laboratory (
http://sf.snu.ac.kr), Dept. of Computer Science & Engineering, Seoul
National University.  Our group studied the mem2reg pass, and we got a
question on its algorithm.

As far as I understand, the mem2reg pass essentially uses the SSA
construction algorithm to promote allocas into registers, but there are
shortcuts for some special cases.  One of the special cases is when an
alloca is "only used within a single basic block." (
http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html#l00435
)

But currently, I cannot understand the algorithm for this special case.  In
this case, the mem2reg pass "perform[s] a single linear pass over the basic
block using the Alloca."  In other words, a load is replaced by a read from
a register corresponding to the nearest preceding store.  The logic I
cannot understand is: "If there is no store before this load, the load
takes the undef value." (
http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html#l00471).
If the block is inside a loop, just writing the undef value is unsound, I
think.


Here is an example C code that I think Clang mis-compiles:
=====
void foo(unsigned char a, unsigned char b) {
  if (b != 2) printf("%d ", (int) a);
}
int main() {
  unsigned char a, b;
  for (unsigned char i = 0; i < 3; ++i) {
    a = b; // (*)
    b = 5;
    b = 2 + i;
    foo (a, b);
  }
  return 0;
}
=====
The mem2reg pass essentially promotes the variable "b" into register by the
shortcut for the special case I mentioned above.  In this case, the load
from "b" at (*) is replaced with the undef value, so "a = b;" is
(essentially) replaced with "a = <undef>;".  However, this changes the
semantics: the original program printed "2 3 ", while the transformed
program prints "0 0 ".

The original program prints "2 3 " for the following reasons.  In the first
iteration (i=0), a=<undef> and b=2, so "foo" does not print anything.  In
the second and the third iteration (i=1 and i=2, respectively), (a, b) =
(2, 3) and (3, 4), and "foo" prints 2 and 3, respectively.

On the other hand, the transformed program prints "0 0 ", since "a" always
equals to <undef> in "foo".  And the Clang compiler (at least version 3.4
through the development branch 3.7) chose to print "0" for the undef value
for this program.


I guess I misunderstand something here, since this case seems to be already
known to LLVM developers (
http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html#l00427).
But I failed to google further information on this issue (by query
"promoteSingleBlockAlloca").  So I asked here for your help.

Thank you,
Jeehoon

-- 
Jeehoon Kang (Ph.D. student) <http://sf.snu.ac.kr/jeehoon.kang>
Software Foundations Laboratory <http://sf.snu.ac.kr>
Seoul National University <http://www.snu.ac.kr>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150717/37ad4f4e/attachment.html>


More information about the llvm-dev mailing list