[cfe-dev] [RFC] automatic variable initialization

Thu Jan 17 11:28:57 PST 2019

> On Jan 17, 2019, at 10:49 AM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> 
> 
> 
> On Thu, Jan 17, 2019 at 10:24 AM JF Bastien <jfbastien at apple.com <mailto:jfbastien at apple.com>> wrote:
> 
>> On Jan 16, 2019, at 7:39 PM, Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>> 
>> 
>> 
>> On Wed, Jan 16, 2019 at 6:50 PM Peter Collingbourne <peter at pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>> On Wed, Jan 16, 2019 at 6:10 PM Kostya Serebryany via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> 
>> 
>> I’ve seen exactly the same problem, it’s definitely fixable but I haven’t tried to tackle it yet. Do you intend to do so? I’m more than happy to help with design / code / reviews. How do you want to proceed?
>> 
>> We even started working on a related analysis pass, but got distracted.  
>> https://reviews.llvm.org/D53336 <https://reviews.llvm.org/D53336>
>> https://reviews.llvm.org/D54541 <https://reviews.llvm.org/D54541>
>> I hope we can get back to it in a month or so, but can't promise. :( 
>> 
>> I don't think you need LTO (or any type of sophisticated analysis at all) to eliminate the stores in this particular case. It seems that all that you need to do is make it the responsibility of the ctor to perform the initialization. We'd have to see how many stores this would actually eliminate in practice, though.
>> 
>> So, we force every CTOR to fully initialize *this and make sure the callers of any CTOR know this. 
>> That's an option.  
> 
> I don’t think this is generally what we want: only considering ctors is too specific. I think we rather want a more general solution.
> 
> Why not both? It seems that moving the initialization into the ctor would totally solve the problem for ctors, even in the cross-TU case for non-users of LTO. Then your analysis would handle (at least some of) the non-ctor cases.

The upside of doing both is that clang doing the work potentially helps compile time.
The downsides are:
We’re doing redundant work (if we have both, the clang optimization is 100% redundant).
I’m working that special-casing ctors is ever-so-slightly more complicated, and I purposefully implemented variable auto-init in a ridiculously simple / naive way so that the compiler cannot possibly get it wrong. My worry for ctors might be unfounded, seeing the code might change my mind.
Cross-TU ctors where the ctor doesn’t do variable auto-init worries me. Sure that’s the user’s problem… but then again, so is UB caused by using uninitialized variables, and we’re trying to be nice to the user.

I also think that we’ll want to eventually do heap-allocated variable auto-init. There, it’s not alloca which would trigger auto-init codegen, it would instead be returns from an allocation function. In that case we can’t just rely on the ctor initializing values (I think?) because the ctor doesn’t know if the allocation function returned nullptr (presumably, outside the ctor there’d be a nullptr check). Granted, this is an extremely borderline argument.

> Do you have a sense of what proportion of dead stores are caused by ctors as opposed to other initialization functions?

In C code (which I’ve looked at quite a bit), 0% of dead stores are caused by ctors :-)

> Peter
> 
> Here are notes that Duncan and I wrote when initially investigating perf / size regressions (for Apple folks, see radar 43700567 and 43704706). I haven’t prototyped an implementation yet, I don’t think it’s too hard. I’ll get to it sometime this year, but I’m sharing my notes in case you can find someone who’s willing to do the work first.
> 
> 
> # IPO should track pointer values which are written to, enabling DSO
> 
> As part of variable-auto init we're creating more stack variable initializations. A common code pattern is the following:
> 
> int optimize_me() {
>   int result;
>   my_amazing_function(&result);
>   // …
>   return result;
> }
> 
> With my change, we now initialize result because I have no idea whether my_amazing_function is truly amazing and will actually initialize result. This isn’t specific to stack variable initialization, it’s just aggravated by it. This example applies to C++ constructors as well.
> 
> I propose an inter-procedural optimization to do something somewhat similar to dereferenceable(<n>) as described here <https://llvm.org/docs/LangRef.html#parameter-attributes>.
> 
> The ultimate goal is to identify dead stores in the caller.
> 
> The algorithm would be as follows:
> 
> For each function:
>   For each parameter:
>     If the parameter is a pointer:
>       Start at the entry point, assuming no bytes are written.
>       Traverse the control flow, and collect the intersection of bytes written through each diamond.
>       Tag the function with attribute written(<M>), where M is the intersection of byte masks from all returns.
> 
> Instead of a mask we could mimic dereferenceable and only track bytes [0, N], without holes.
> 
> This analysis can always conservatively say that no bits are ever written.
> 
> With this analysis in place, dead store elimination can then treat each call to a function tagged with written(<M>) as a store to the bytes in the mask. This will allow eliminating dead stores, of which there will be many more due to stack variable initialization. This will also allow eliminating dead stores to the heap.
> 
> I haven’t thought about the impact this would have on invoke. I think it’s OK to ignore invoke for now, and only care about call.
> 
> 
> # Frontend support for attribute to track pointer values that are written to
> 
> We need additional frontend support for "IPO should track pointer values which are written to, enabling DSO”.
> 
> In particular, in some cases, I suspect we can cheaply infer this attribute in IRGen.  This would allow us to leverage it in places where IPO wouldn't be allowed to infer the attribute (e.g., linkonce_odr; see this bug for why we can't <https://bugs.llvm.org/show_bug.cgi?id=26774>).
> 
> We might also add a source-level Clang attribute, which Sema would reject (error out) if a simple CFG analysis cannot prove its correctness (error if it's invalid or "too hard to prove", similar to constexpr).  This attribute would:
> - document behaviour in the source, making it obvious in the API that it's safe to pass in uninitialized memory;
> - suppress -Wuninitialized and -Wmaybe-uninitialized in the caller; and
> - allow callers to optimize even if the callee definition is not available/in another TU.
> 
> For the source-level attribute, we might want to support something simpler than a mask as detailed above—e.g., all-or-nothing, or listing struct fields instead of bit ranges.
> 
> 
> 
> 
>> Peter
>> 
>>  
>> 
>>> Compare to -ftrivial-auto-var-init=zero:
>>> 
>>> 	xorps	%xmm0, %xmm0
>>> 	movaps	%xmm0, 48(%rsp)
>>> 	movaps	%xmm0, 32(%rsp)
>>> 	movaps	%xmm0, 16(%rsp)
>>> 	movaps	%xmm0, (%rsp)
>>> 	movq	$0, 64(%rsp)
>>> 
>>> 	movi	v0.2d, #0000000000000000
>>> 	mov	x0, sp
>>> 	str	xzr, [sp, #64]
>>> 	stp	q0, q0, [sp, #32]
>>> 	stp	q0, q0, [sp]
>>> 
>>> 
>>> OTOH, with a relatively simple LTO-based analysis we can get rid of these extra auto-inits completely. 
>>> 
>>> CC @Vitaly Buka <mailto:vitalybuka at google.com> @Peter Collingbourne <mailto:pcc at google.com> @Evgeniy Stepanov <mailto:eugenis at google.com> 
>> 
>> Why LTO? Are you observing this cross-TU? I think we want to first do same-TU, and then teach LTO about it too.
>> 
>> For the analysis pass, it doesn't matter if it's same-TU or cross-TU -- it will use all information it sees. 
>> So, it should work for same-TU even w/o LTO. 
>> We see this either cross-TU or when a CTOR is marked as noinline. 
>> 
>>  
>> 
>> 
>>> --kcc 
>>> 
>>> On Tue, Dec 18, 2018 at 11:38 AM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>> 2019 is going to have one bug class fewer. :) 
>>> (unless Jann&Co find new bug classes again)
>>> 
>>> Looking forward to the followup patches (e.g. padding gaps) 
>>> and to doing the measurements. 
>>> 
>>> --kcc 
>>> 
>>> On Tue, Dec 18, 2018 at 10:53 AM JF Bastien <jfbastien at apple.com <mailto:jfbastien at apple.com>> wrote:
>>> Hello fans of the pre-C++11 `auto` keyword!
>>> 
>>> Thanks to reviews from Peter and Richard, automatic variable initialization landed last night:
>>> https://reviews.llvm.org/rL349442 <https://reviews.llvm.org/rL349442>
>>> 
>>> Chandler pointed out that I should have circled back to the RFC in case any of the finer points needed tuning. Let me know if there’s any follow-up you’d like! I’ve made zero-init harder / uglier to use as was requested, while allowing folks like Kostya to test it out.
>>> 
>>> Let me know if you find bugs (in your code, or in the implementation), and please file bugs on any missed optimization opportunities. I know of a handful of missing optimizations which I intend to tackle in the near future. I’d also love to hear what kind of performance / size impact you see from various codebases. 
>>> 
>>> As pointed out in the patch there’s a few other hardenings that I’ll want to tackle next. I’ll do so in the new year.
>>> 
>>> Thanks,
>>> 
>>> JF
>>> 
>>> 
>>>> On Dec 11, 2018, at 3:18 PM, Joe Bialek via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> 
>>>> We went with zero initialization for production builds in Visual Studio because we anticipate zero being the fastest, smallest code gen, and safest value for the following cases:
>>>> Pointer: A non-zero pointer could potentially point at valid code. On Windows x64, the first 64kb of the virtual address space is guaranteed to not be mappable and the last 512GB of the virtual address space (today) has the space guarantee. So null pointer or near-null pointer dereferences are denial of service at worst.
>>>> Size: If you forget to set a size as an out parameter we’d rather it’s set to 0 so you don’t index out of bounds.
>>>> Index: Same as size.
>>>>  
>>>> We are using this mitigation to downgrade vulnerabilities from remote code execution, elevation of privilege, or information disclosure, down to denial of service at worst.
>>>>  
>>>> Assuming the denial of service isn’t for a scenario where it matters (there are many types of DOS we don’t really worry about for security), we will not bother servicing the vulnerabilities down-level and instead only fix them for new code.
>>>>  
>>>> It is helpful for us to have deterministic behavior to help accomplish this goal. If an engineer knows a value is always set to 0, they can quickly determine if the bug report is actually exploitable or not. If it is initialized to a random or compile time constant value, we’ll need to do more work to determine if the bug is exploitable.
>>>>  
>>>> We are not zero initializing CHK builds to prevent folks from taking a dependency on this feature.
>>>>  
>>>>  
>>>> Joe
>>>>  
>>>> From: Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> 
>>>> Sent: Tuesday, December 11, 2018 3:04 PM
>>>> To: Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>>; Jann Horn <jannh at google.com <mailto:jannh at google.com>>
>>>> Cc: David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>; Richard Smith <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>>; Clang Dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>; Joe Bialek <jobialek at microsoft.com <mailto:jobialek at microsoft.com>>
>>>> Subject: Re: [cfe-dev] [RFC] automatic variable initialization
>>>>  
>>>> One more dimension in this discussion (zero-init vs pattern-init) is what will security get from those options.
>>>> My second-hand knowledge here suggests that zero-init may have *on average* better security guarantees than non-zero init.
>>>> For example if the uninitialized data is interpreted as a length of something, it better be zero than any large number or, worse, a negative signed number. 
>>>> +Jann Horn <mailto:jannh at google.com> and Joe Bialek who have first-hand knowledge here. 
>>>>  
>>>> --kcc 
>>>>  
>>>>  
>>>>  
>>>>  
>>>> On Thu, Dec 6, 2018 at 3:58 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>>>  
>>>> 
>>>> On Thu, Dec 6, 2018 at 3:55 PM Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>> wrote:
>>>>  
>>>> 
>>>> On Thu, Dec 6, 2018 at 3:43 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>>>  
>>>> 
>>>> On Thu, Dec 6, 2018 at 3:03 PM Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>> wrote:
>>>>  
>>>> 
>>>> On Thu, Dec 6, 2018 at 1:01 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>>>  
>>>> 
>>>> On Wed, Dec 5, 2018 at 6:07 PM Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> I have the same question as David: I don't understand why this isn't just an experimental build option disabled by default? (Google, Apple, etc. can just enable it in their build script, no need to patch the source).
>>>>  
>>>> I will need to rebuild half a dozen compiler binaries to do the measurements I need.
>>>> This is going to double the cost of the effort for me because it adds too many extra moving pieces. 
>>>>  
>>>> Not sure I follow, you have to build the compiler anyway to get it after the code is patched?
>>>> The only thing you would have to do is one CL to enable the build flag (that expose the command line flag) inside your codebase and then you get your toolchain as usual?
>>>>  
>>>> Once the JF's patch is in, I'll have it in most production compilers I care about in 2-6 weeks, w/o any effort on my side. 
>>>> None of those builders (easily) support adding custom patches, and even a build flag is going to be very non-trivial. 
>>>> I can deal with one or two of those builds relatively easily, but not with the rest. 
>>>>  
>>>> What builders are you referring to? So far I was assuming we were talking about your internal infrastructure that produces your production compilers.
>>>>  
>>>> I am talking about our internal infra. 
>>>> But we have multiple independent builders from multiple independent compiler users (e.g. Chrome and Android are using different compiler builds). 
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>> -- 
>>>> Mehdi
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>> The only argument I read in the thread was about allowing people to make performance measurement without rebuilding the compiler, but I have a hard time reconciliation this with the fact that we're talking about not shipping this before performing the actual measurements?
>>>>  
>>>> I expect that anyone that cares enough about the perf impact of this to influence the development of the feature in clang should already be rebuilding their compiler today. 
>>>>  
>>>> -- 
>>>> Mehdi
>>>>  
>>>>  
>>>> On Mon, Dec 3, 2018 at 5:58 PM David Blaikie via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> Just out of curiosity - what's the major difference between a build-time off-by-default feature and a build-time on-by-default-but-off-in-release branches feature? If we're only targeting groups that build/release the compiler themselves, then they're likely able to opt-in to a build-time feature easily enough, I'd think? & then there's no need to make our releases different from day-to-day builds?
>>>> 
>>>> But sounds like folks are in general agreement of a way forward, so I don't want to disrupt/delay that.
>>>> 
>>>> On Wed, Nov 28, 2018 at 11:14 PM Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote:
>>>> Suggested compromise technique to at least get an initial set of numbers:
>>>>  
>>>> 1) Require a special, long, ugly flag name.
>>>> 2) Make it a CC1 flag, requiring -Xclang ... to use.
>>>> 3) Emit a warning by default (that cannot be suppressed with a -Wno-... flag) when this flag is enabled.
>>>> 4) Commit to never including this flag in any upstream release. Either we remove it before the next release branches or we revert it on the branch.
>>>>  
>>>> Most of the folks we're hoping to get performance data with are willing to use a not-yet-released build of Clang. They won't have to actually patch it in any way. They will have strong reminders to not deploy this in any way due to the warning.
>>>>  
>>>> Thoughts?
>>>>  
>>>> On Wed, Nov 28, 2018 at 4:34 PM Kostya Serebryany via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> On Wed, Nov 28, 2018 at 3:28 PM David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>  
>>>> 
>>>> On Wed, Nov 28, 2018 at 3:17 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>>> 
>>>> Seems easier to me to separate the two pieces - move ahead with the non-zero options, and separate the discussion on the zero option. You can present performance numbers from what you can measure without shipping a compiler with the feature - and if those numbers are sufficiently compelling compared to the risks of slicing the language, then perhaps we go that way.
>>>>  
>>>> This approach will significantly impair my ability to do the measurements I need. 
>>>> 
>>>> I'm aware waht I'm proposing would make it more difficult for some people to take measurements - that's a tradeoff to be sure - one where I err in this direction.
>>>> 
>>>> Specifically for Google though - would it be that difficult for Google to opt-in to a certain build configuration of LLVM?
>>>>  
>>>> Absolutely yes. 
>>>> Google is not just a single monolithic code base <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdelivery.acm.org%2F10.1145%2F2860000%2F2854146%2Fp78-potvin.pdf%3Fip%3D104.133.8.94%26id%3D2854146%26acc%3DOA%26key%3D4D4702B0C3E38B35%252E4D4702B0C3E38B35%252E4D4702B0C3E38B35%252E5945DC2EABF3343C%26__acm__%3D1543446999_3aadcd36f657e2297430c38bee93f16c&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510848537&sdata=dd4I1xQ5qfawzLwsnBM3iqsypqZ%2BlFswj5IgKa%2FLHbw%3D&reserved=0>. 
>>>> 
>>>> Couldn't access this link ("An error occurred while processing your request" - but yeah, I understand there's a bunch of different pieces of Google beyond the "stuff that runs in data centers" piece we mostly support.
>>>>  
>>>> Besides that monolithic thing, we have Android, Chrome, ChromeOS, Fuchsia, and a bazillion of smaller efforts that use their own toolchains. 
>>>> 
>>>> Still, most/all of these build their own compilers, I think? But yeah, that adds an opt-in overhead to each project, for sure.
>>>>  
>>>> In some cases the most reliable and complete way of measuring performance changes is to submit the changes to revision control, 
>>>> and let the performance bots shew it for a couple of days. That's how we iterated with the LLVM's CFI in Chrome. 
>>>> We will also need to work with the upstream Linux kernel -- it's hard enough for them to use clang and a modified clang will cost us much more effort. 
>>>> 
>>>> Yeah, I can imagine that one's a bit trickier - how's performance evaluation of the kernel done?
>>>>  
>>>> I don't think anyone knows that. :-|
>>>> And requiring a compiler patch will shift the problem from "hard" to "I'd better do something else". 
>>>>  
>>>> (though, again, I imagine a fair amount of progress could be made without the zero-init feature - perhaps enough to say "hey, here are all the places we have run tests & seen the performance tradeoff is worthwhile for us (& possibly that it's close to the zero-init case, but that's sort of orthogonal, imho - that it's worthwhile is the main thing) - perhaps other folks would be willing to test it (non-zero init) & see if it's worthwhile to them - and if it isn't/they're interested in more performance, maybe all that evidence we can gain from the places where it's easy for us to rebuild compilers, etc, would make it interesting enough to motivate someone to do build the kernel with a custom compiler & do some performance measurements, etc... 
>>>> 
>>>> Sorry that was a bit rambly, anyway.
>>>> 
>>>> - Dave
>>>>  
>>>>  
>>>>  
>>>>  
>>>> We/Google do build the compiler from scratch, I assume we pick the configuration options we build with & some of them probably aren't the defaults for a release build of LLVM. So if it was important that Google's production compiler had these features enabled (rather than building a test compiler for running some experiments), that doesn't seem (at least to me, at this moment) especially prohibitive, is it?
>>>>  
>>>> _______________________________________________
>>>> 
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510858549&sdata=J1qIBl35xJOU4LXt2pmJKrl7Jo3RzuG04e7mWmXdULo%3D&reserved=0>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510868553&sdata=wC0zWaGYC6YVWA3W7X3PEUjbePjhZdOoKVy5vzD6y2I%3D&reserved=0>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510868553&sdata=wC0zWaGYC6YVWA3W7X3PEUjbePjhZdOoKVy5vzD6y2I%3D&reserved=0>_______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>> 
>> 
>> -- 
>> -- 
>> Peter
> 
> 
> 
> -- 
> -- 
> Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190117/8a93d42f/attachment.html>