<div dir="ltr">We are careful not to try this optimization where it would extend the range of loaded memory; this is purely for what I call a "load doughnut". :)<br>Reading past either specified edge would be very bad because it could cause a memory fault / exception where there was none in the original program. That's definitely not legal.<br><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 12:20 PM, Craig, Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
I'm having a hard time finding any problems here, at least as long
as the value is in the middle. I wouldn't expect the contents of
x[2] to affect the timing or power usage of anything. I guess there
would be a minor "bad" side effect in that a memory read watchpoint
would trigger with the 128 bit load that wouldn't be there with the
32-bit loads. I think it is semantically very similar to this
situation as well...<br>
<blockquote>v4i32 first_call(int *x) { //use all of the array<br>
int f0 = x[0];<br>
int f1 = x[1];<br>
int f2 = x[2];<br>
int f3 = x[3];<br>
return (v4i32) { f0, f1, f2, f3 };<br>
}<br>
v4i32 second_call(int *x) { //use some of the array<br>
int s0 = x[0];<br>
int s1 = x[1];<br>
int s2 = 0;<br>
int s3 = x[3];<br>
return (v4i32) { s0, s1, s2, s3 };<br>
}<br>
first_call(x);<br>
second_call(x);<br>
</blockquote>
The implementation isn't going to zero out the stack in between
those calls, so for a short period of time, the memory location of
s2 will contain x[2].<br>
<br>
I'm less sure if the gaps are on the edges. I'm worried that you
might ending up crossing some important address boundary if you look
at something earlier or later than what the user requested.<div><div class="h5"><br>
<br>
<div>On 3/16/2016 11:38 AM, Sanjay Patel
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi Ben -<br>
</div>
<br>
Thanks for your response. For the sake of argument, let's
narrow the scope of the problem to eliminate some of the
variables you have rightfully cited. <br>
<br>
Let's assume we're not dealing with volatiles, atomics, or FP
operands. We'll even guarantee that the extra loaded value is
never used. This is, in fact, the scenario that <a href="http://reviews.llvm.org/rL263446" target="_blank"></a><a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a>
is concerned with.<br>
<br>
</div>
Related C example:<br>
<br>
typedef int v4i32 __attribute__((__vector_size__(16)));<br>
<br>
// Load some almost-consecutive ints as a vector.<br>
v4i32 foo(int *x) {<br>
int x0 = x[0];<br>
int x1 = x[1];<br>
// int x2 = x[2]; // U can't touch this? <br>
int x3 = x[3];<br>
return (v4i32) { x0, x1, 0, x3 };<br>
}<br>
<br>
<div>
<div>For x86, we notice that we have nearly a v4i32 vector's
worth of loads, so we just turn that into a vector load and
mask out the element that's getting set to zero:<br>
movups (%rdi), %xmm0 ; load 128-bits
instead of three 32-bit elements<br>
andps LCPI0_0(%rip), %xmm0 ; put zero bits into the
3rd element of the vector<br>
<br>
</div>
<div>Should that optimization be disabled by a hypothetical
-fextra-secure flag?<br>
</div>
<div><br>
<br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Mar 16, 2016 at 7:59 AM, Craig,
Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Regarding accessing
extra data, there are at least some limits as to what can
be accessed. You can't generate extra loads or stores to
volatiles. You can't generate extra stores to atomics,
even if the extra stores appear to be the same value as
the old value.<br>
<br>
As for determining where the perf vs. security line should
be drawn, I would argue that most compilers have gone too
far on the perf side while optimizing undefined behavior.
Dead store elimination leaving passwords in memory,
integer overflow checks getting optimized out, and NULL
checks optimized away. Linus Torvalds was complaining
about those just recently on this list, and while I don't
share his tone, I agree with him regarding the harm these
optimizations can cause.<br>
<br>
If I'm understanding correctly, for your specific cases,
you are wondering if it is fine to load and operate on a
floating point value that the user did not specifically
request you to operate on. This could cause (at least)
two different problems. First, it could cause a floating
point exception. I think the danger of the floating point
exception should rule out loading values the user didn't
request. Second, loading values the user didn't specify
could enable a timing attack. The timing attack is scary,
but I don't think it is something we can really fix in the
general case. As long as individual assembly instructions
have impractical-to-predict execution times, we will be at
the mercy of the current hardware state. There are timing
attacks that can determine TLS keys in a different VM
instance based off of how quickly loads in the current
process execute. If our worst timing attack problems are
floating point denormalization issues, then I think we are
in a pretty good state.
<div>
<div><br>
<br>
<div>On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev
wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div>
<div dir="ltr">
<div>
<div>
<div>[cc'ing cfe-dev because this may require
some interpretation of language law]<br>
<br>
My understanding is that the compiler has
the freedom to access extra data in C/C++
(not sure about other languages); AFAIK, the
LLVM LangRef is silent about this. In C/C++,
this is based on the "as-if rule":<br>
<a href="http://en.cppreference.com/w/cpp/language/as_if" target="_blank">http://en.cppreference.com/w/cpp/language/as_if</a><br>
</div>
</div>
<br>
</div>
So the question is: where should the optimizer
draw the line with respect to perf vs. security if
it involves operating on unknown data? Are there
guidelines that we can use to decide this?<br>
<br>
<div>
<div>The masked load transform referenced below
is not unique in accessing / operating on
unknown data. In addition to the related
scalar loads -> vector load transform that
I've mentioned earlier in this thread, see for
example:<br>
<a href="https://llvm.org/bugs/show_bug.cgi?id=20358" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br>
<div>(and the security paper and patch review
linked there)<br>
</div>
<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Mar 14, 2016 at
10:26 PM, Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi
Sanjay,</span></p>
<span>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The
real question I have is whether it
is legal to read the extra memory,
regardless of whether this is a
masked load or </span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something
else.</span></p>
</span>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No,
It is not legal AFAIK because by
doing that we are exposing the
content of the memory which
programmer</span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does
not intend to. This may be
vulnerable for exploitation.</span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">
<div>
<div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">
llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Sanjay
Patel via llvm-dev<br>
<b>Sent:</b> Monday, March 14,
2016 10:37 PM<br>
<b>To:</b> Nema, Ashutosh<br>
<b>Cc:</b> llvm-dev<br>
<b>Subject:</b> Re: [llvm-dev]
masked-load endpoints
optimization</span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">I checked
in a patch to do this
transform for x86-only for
now:<br>
<a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a>
/ <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a></p>
</div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On Fri,
Mar 11, 2016 at 9:57 AM,
Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank"></a><a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>>
wrote:</p>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">Thanks,
Ashutosh.</p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">Yes,
either TTI or TLI
could be used to limit
the transform if we do
it in CGP rather than
the DAG.</p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">The
real question I have is
whether it is legal to
read the extra memory,
regardless of whether
this is a masked load or
something else.</p>
</div>
<p class="MsoNormal">Note
that the x86 backend
already does this, so
either my proposal is ok
for x86, or we're already
doing an illegal
optimization:</p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"><br>
define <4 x i32>
@load_bonus_bytes(i32*
%addr1, <4 x i32>
%v) {<br>
%ld1 = load i32, i32*
%addr1<br>
%addr2 = getelementptr
i32, i32* %addr1, i64 3<br>
%ld2 = load i32, i32*
%addr2<br>
%vec1 = insertelement
<4 x i32> undef,
i32 %ld1, i32 0<br>
%vec2 = insertelement
<4 x i32> %vec1,
i32 %ld2, i32 3<br>
ret <4 x i32>
%vec2<br>
}<br>
<br>
$ ./llc -o -
loadcombine.ll <br>
...<br>
movups (%rdi),
%xmm0<br>
retq<br>
<br>
<br>
</p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On
Thu, Mar 10, 2016 at
10:22 PM, Nema,
Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank"></a><a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>>
wrote:</p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This
looks
interesting,
the main
motivation
appears to be
replacing
masked vector
load with a
general vector
load followed
by a select.</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed
masked vector
loads are in
general
expensive in
comparison
with a vector
load.</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But
if first &
last element
of a masked
vector load
are guaranteed
to be accessed
then it can be
transformed to
a vector load.</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In
opt this can
be driven by
TTI, where the
benefit of
this
transformation
should be
checked.</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">
llvm-dev
[mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf
Of </b>Sanjay
Patel via
llvm-dev<br>
<b>Sent:</b>
Friday, March
11, 2016 3:37
AM<br>
<b>To:</b>
llvm-dev<br>
<b>Subject:</b>
[llvm-dev]
masked-load
endpoints
optimization</span></p>
<div>
<div>
<p class="MsoNormal"> </p>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">If we're loading the first and last elements
of a vector
using a masked
load [1], can
we replace the
masked load
with a full
vector load?<br>
<br>
"The result of
this operation
is equivalent
to a regular
vector load
instruction
followed by a
‘select’
between the
loaded and the
passthru
values,
predicated on
the same mask.
However, using
this intrinsic
prevents
exceptions on
memory access
to masked-off
lanes."<br>
<br>
I think the
fact that
we're loading
the endpoints
of the vector
guarantees
that a full
vector load
can't have any
different
faulting/exception
behavior on
x86 and most
(?) other
targets. We
would,
however, be
reading memory
that the
program has
not explicitly
requested.</p>
</div>
<p class="MsoNormal">IR
example:<br>
<br>
define <4 x
i32>
@maskedload_endpoints(<4
x i32>*
%addr, <4 x
i32> %v) {</p>
</div>
<p class="MsoNormal">
; load the
first and last
elements
pointed to by
%addr and
shuffle those
into %v</p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"> %res = call <4 x i32>
@llvm.masked.load.v4i32(<4
x i32>*
%addr, i32 4,
<4 x i1>
<i1 1, i1
0, i1 0, i1
1>, <4 x
i32> %v)<br>
ret <4 x
i32> %res<br>
}</p>
</div>
<div>
<p class="MsoNormal">would
become
something
like:</p>
</div>
<div>
<p class="MsoNormal"><br>
define <4 x
i32>
@maskedload_endpoints(<4
x i32>*
%addr, <4 x
i32> %v) {</p>
</div>
<div>
<p class="MsoNormal">
%vecload =
load <4 x
i32>, <4
x i32>*
%addr, align 4</p>
</div>
<div>
<p class="MsoNormal">
%sel = select
<4 x i1>
<i1 1, i1
0, i1 0, i1
1>, <4 x
i32>
%vecload,
<4 x
i32> %v</p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"> ret <4 x i32> %sel<br>
}</p>
</div>
<div>
<p class="MsoNormal">If
this isn't
valid as an IR
optimization,
would it be
acceptable as
a DAG combine
with target
hook to opt
in?</p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><br>
[1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">
</a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank"></a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><span><font color="#888888">
</font></span></pre>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> <br>
<pre cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
</pre>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
</pre>
</div></div></div>
</blockquote></div><br></div></div></div></div>