<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 5/9/2017 6:05 AM, Hiroshi 7 Inoue
via llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:OFD9DD1F34.5D5DEBF6-ON4925811B.0046E68F-4925811B.0047F137@notes.na.collabserv.com">
<p><font size="2">Hi,</font><br>
<br>
<font size="2">I am working to improve SROA to generate better
code when a method has a struct in its arguments. I would
appreciate it if I could have any suggestions or comments on
how I can best proceed with this optimization.</font><br>
<br>
<font size="2">* Problem *</font><br>
<font size="2">I observed that LLVM often generates redundant
instructions around glibc’s istreambuf_iterator. The problem
comes from the scalar replacement (SROA) for methods with an
aggregate as an argument. Here is a simplified example in C. </font><br>
<br>
<font size="2">struct record {</font><br>
<font size="2"> long long a;</font><br>
<font size="2"> int b;</font><br>
<font size="2"> int c;</font><br>
<font size="2">};</font><br>
<br>
<font size="2">int func(struct record r) {</font><br>
<font size="2"> for (int i = 0; i < r.c; i++)</font><br>
<font size="2"> r.b++;</font><br>
<font size="2"> return r.b;</font><br>
<font size="2">}</font><br>
<br>
<font size="2">When updating r.b (or r.c as well), SROA
generates redundant instructions on some platforms (such as
x86_64 and ppc64); here, r.b and r.c are packed into one
64-bit GPR when the struct is passed as a method argument. The
problem is caused when the same memory location is accessed by
load/store instructions of different types.</font><br>
<font size="2">For this example, CLANG generates following IRs
to initialize the struct for ppc64 and x86_64. For both
platforms, the 64-bit value is stored into memory allocated by
alloca first. Later, the same memory location is accessed as
32-bit integer values (r.b and r.c).</font><br>
<br>
<font size="2">for ppc64</font><br>
<font size="2"> %struct.record = type { i64, i32, i32 }</font><br>
<br>
<font size="2"> define signext i32 @ppc64le_func([2 x i64]
%r.coerce) #0 {</font><br>
<font size="2"> entry:</font><br>
<font size="2"> %r = alloca %struct.record, align 8</font><br>
<font size="2"> %0 = bitcast %struct.record* %r to [2 x i64]*</font><br>
<font size="2"> store [2 x i64] %r.coerce, [2 x i64]* %0, align
8</font><br>
<font size="2"> ....</font><br>
<br>
<font size="2">for x86_64</font><br>
<font size="2"> define i32 @x86_64_func(i64 %r.coerce0, i64
%r.coerce1) #0 {</font><br>
<font size="2"> entry:</font><br>
<font size="2"> %r = alloca %struct.record, align 8</font><br>
<font size="2"> %0 = bitcast %struct.record* %r to { i64, i64 }*</font><br>
<font size="2"> %1 = getelementptr inbounds { i64, i64 }, { i64,
i64 }* %0, i32 0, i32 0</font><br>
<font size="2"> store i64 %r.coerce0, i64* %1, align 8</font><br>
<font size="2"> %2 = getelementptr inbounds { i64, i64 }, { i64,
i64 }* %0, i32 0, i32 1</font><br>
<font size="2"> store i64 %r.coerce1, i64* %2, align 8</font><br>
<font size="2"> ....</font><br>
<br>
<font size="2">For such code sequence, the current SROA
generates instructions to update only upper (or lower) half of
the 64-bit value when storing r.b (or r.c). SROA can split an
i64 value into two i32 values under some conditions (e.g. when
the struct contains only int b and int c in this example), but
it is not capable of splitting complex cases.</font><br>
</p>
</blockquote>
<p>When there are accesses of mixed type to an alloca, SROA just
treats the whole alloca as a big integer, and generates PHI nodes
appropriately. In many cases, instcombine would then slice up the
generated PHI nodes to use more appropriate types, but that
doesn't work out here. (See
InstCombiner::SliceUpIllegalIntegerPHI.) Probably the right
solution is to make instcombine more aggressive here; it's hard to
come up with a generally useful transform in SROA without
reasoning about control flow.<br>
</p>
<p>-Eli<br>
</p>
<pre class="moz-signature" cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
</body>
</html>