[cfe-dev] the as-if rule / perf vs. security

Sanjay Patel via cfe-dev cfe-dev at lists.llvm.org
Tue Mar 15 08:46:18 PDT 2016


[cc'ing cfe-dev because this may require some interpretation of language
law]

My understanding is that the compiler has the freedom to access extra data
in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is
silent about this. In C/C++, this is based on the "as-if rule":
http://en.cppreference.com/w/cpp/language/as_if

So the question is: where should the optimizer draw the line with respect
to perf vs. security if it involves operating on unknown data? Are there
guidelines that we can use to decide this?

The masked load transform referenced below is not unique in accessing /
operating on unknown data. In addition to the related scalar loads ->
vector load transform that I've mentioned earlier in this thread, see for
example:
https://llvm.org/bugs/show_bug.cgi?id=20358
(and the security paper and patch review linked there)


On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad <
Asghar-ahmad.Shahid at amd.com> wrote:

> Hi Sanjay,
>
>
>
> >The real question I have is whether it is legal to read the extra memory,
> regardless of whether this is a masked load or
>
> >something else.
>
> No, It is not legal AFAIK because by doing that we are exposing the
> content of the memory which programmer
>
> does not intend to. This may be vulnerable for exploitation.
>
>
>
> Regards,
>
> Shahid
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Sanjay
> Patel via llvm-dev
> *Sent:* Monday, March 14, 2016 10:37 PM
> *To:* Nema, Ashutosh
> *Cc:* llvm-dev
> *Subject:* Re: [llvm-dev] masked-load endpoints optimization
>
>
>
> I checked in a patch to do this transform for x86-only for now:
> http://reviews.llvm.org/D18094 / http://reviews.llvm.org/rL263446
>
>
>
> On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel <spatel at rotateright.com>
> wrote:
>
> Thanks, Ashutosh.
>
> Yes, either TTI or TLI could be used to limit the transform if we do it in
> CGP rather than the DAG.
>
> The real question I have is whether it is legal to read the extra memory,
> regardless of whether this is a masked load or something else.
>
> Note that the x86 backend already does this, so either my proposal is ok
> for x86, or we're already doing an illegal optimization:
>
>
> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
>   %ld1 = load i32, i32* %addr1
>   %addr2 = getelementptr i32, i32* %addr1, i64 3
>   %ld2 = load i32, i32* %addr2
>   %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>   %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
>   ret <4 x i32> %vec2
> }
>
> $ ./llc -o - loadcombine.ll
> ...
>     movups    (%rdi), %xmm0
>     retq
>
>
>
>
> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com>
> wrote:
>
> This looks interesting, the main motivation appears to be replacing masked
> vector load with a general vector load followed by a select.
>
>
>
> Observed masked vector loads are in general expensive in comparison with a
> vector load.
>
>
>
> But if first & last element of a masked vector load are guaranteed to be
> accessed then it can be transformed to a vector load.
>
>
>
> In opt this can be driven by TTI, where the benefit of this transformation
> should be checked.
>
>
>
> Regards,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Sanjay
> Patel via llvm-dev
> *Sent:* Friday, March 11, 2016 3:37 AM
> *To:* llvm-dev
> *Subject:* [llvm-dev] masked-load endpoints optimization
>
>
>
> If we're loading the first and last elements of a vector using a masked
> load [1], can we replace the masked load with a full vector load?
>
> "The result of this operation is equivalent to a regular vector load
> instruction followed by a ‘select’ between the loaded and the passthru
> values, predicated on the same mask. However, using this intrinsic prevents
> exceptions on memory access to masked-off lanes."
>
> I think the fact that we're loading the endpoints of the vector guarantees
> that a full vector load can't have any different faulting/exception
> behavior on x86 and most (?) other targets. We would, however, be reading
> memory that the program has not explicitly requested.
>
> IR example:
>
> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
>
>   ; load the first and last elements pointed to by %addr and shuffle those
> into %v
>
>   %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>* %addr, i32 4,
> <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
>   ret <4 x i32> %res
> }
>
> would become something like:
>
>
> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
>
>   %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
>
>   %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %vecload, <4
> x i32> %v
>
>   ret <4 x i32> %sel
> }
>
> If this isn't valid as an IR optimization, would it be acceptable as a DAG
> combine with target hook to opt in?
>
>
> [1] http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160315/5dc42e80/attachment.html>


More information about the cfe-dev mailing list