[cfe-dev] [RFC] automatic variable initialization

Wed Jan 16 16:55:46 PST 2019

> On Jan 16, 2019, at 4:48 PM, Kostya Serebryany <kcc at google.com> wrote:
> 
> we've collected the performance and code size data on one of our major server-side benchmarks (huge x86_64 binary)

Nice, thanks for collecting these!

> 
> Stack Init Method		-O1	-O2	-O3	-O2, ThinLTO	-O2, ThinLTO, FDO
> zero-init	Size Regression	0.5%	0.6%	0.6%	0.4%	0.3%
> Perf Regression	0.7%	-0.9%	1.1%	1.0%	1.7%
> pattern-init	Size Regression	1.6%	1.6%	1.6%	1.4%	1.6%
> Perf Regression	1.1%	1.2%	1.2%	0.8%	1.1%
> 
> so, already not bad, but we pay too much code size for the pattern-init.
> 
> The following seems to be the dominating code pattern where we have to insert initialization: 
> 
> struct Bar {  // think std::vector
>   Bar() : a(nullptr), b(nullptr), c(nullptr) {}
>   long *a, *b, *c;
> };
> struct Foo {
>   Foo() __attribute__((noinline)) {}
>   Bar v1, v2, v3;
> };
> int main() { Foo foo; }
> 
> The Foo CTOR does not get inlined and so we insert initialization for all Foo's fields. 
> What's worse, is that -ftrivial-auto-var-init=pattern produces very sub-optimal code here: 
> 
> x86_64:
> 	movq	.L__const.main.foo+64(%rip), %rax
> 	movq	%rax, 64(%rsp)
> 	movups	.L__const.main.foo+48(%rip), %xmm0
> 	movaps	%xmm0, 48(%rsp)
> 	movups	.L__const.main.foo+32(%rip), %xmm0
> 	movaps	%xmm0, 32(%rsp)
> 	movups	.L__const.main.foo+16(%rip), %xmm0
> 	movaps	%xmm0, 16(%rsp)
> 	movups	.L__const.main.foo(%rip), %xmm0
> 	movaps	%xmm0, (%rsp)
> 
> aarch64:
> 	adrp	x8, .L__const.main.foo
> 	add	x8, x8, :lo12:.L__const.main.foo
> 	ldp	q0, q3, [x8]
> 	ldr	x9, [x8, #64]
> 	ldp	q1, q2, [x8, #32]
> 	mov	x0, sp
> 	stp	q0, q3, [sp]
> 	str	x9, [sp, #64]
> 	stp	q1, q2, [sp, #32]
> 
> I.e. instead of materializing the 16-byte pattern once, it does it many times. 

I’ve seen exactly the same problem, it’s definitely fixable but I haven’t tried to tackle it yet. Do you intend to do so? I’m more than happy to help with design / code / reviews. How do you want to proceed?

> Compare to -ftrivial-auto-var-init=zero:
> 
> 	xorps	%xmm0, %xmm0
> 	movaps	%xmm0, 48(%rsp)
> 	movaps	%xmm0, 32(%rsp)
> 	movaps	%xmm0, 16(%rsp)
> 	movaps	%xmm0, (%rsp)
> 	movq	$0, 64(%rsp)
> 
> 	movi	v0.2d, #0000000000000000
> 	mov	x0, sp
> 	str	xzr, [sp, #64]
> 	stp	q0, q0, [sp, #32]
> 	stp	q0, q0, [sp]
> 
> 
> OTOH, with a relatively simple LTO-based analysis we can get rid of these extra auto-inits completely. 
> 
> CC @Vitaly Buka <mailto:vitalybuka at google.com> @Peter Collingbourne <mailto:pcc at google.com> @Evgeniy Stepanov <mailto:eugenis at google.com> 

Why LTO? Are you observing this cross-TU? I think we want to first do same-TU, and then teach LTO about it too.

> --kcc 
> 
> On Tue, Dec 18, 2018 at 11:38 AM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
> 2019 is going to have one bug class fewer. :) 
> (unless Jann&Co find new bug classes again)
> 
> Looking forward to the followup patches (e.g. padding gaps) 
> and to doing the measurements. 
> 
> --kcc 
> 
> On Tue, Dec 18, 2018 at 10:53 AM JF Bastien <jfbastien at apple.com <mailto:jfbastien at apple.com>> wrote:
> Hello fans of the pre-C++11 `auto` keyword!
> 
> Thanks to reviews from Peter and Richard, automatic variable initialization landed last night:
> https://reviews.llvm.org/rL349442 <https://reviews.llvm.org/rL349442>
> 
> Chandler pointed out that I should have circled back to the RFC in case any of the finer points needed tuning. Let me know if there’s any follow-up you’d like! I’ve made zero-init harder / uglier to use as was requested, while allowing folks like Kostya to test it out.
> 
> Let me know if you find bugs (in your code, or in the implementation), and please file bugs on any missed optimization opportunities. I know of a handful of missing optimizations which I intend to tackle in the near future. I’d also love to hear what kind of performance / size impact you see from various codebases. 
> 
> As pointed out in the patch there’s a few other hardenings that I’ll want to tackle next. I’ll do so in the new year.
> 
> Thanks,
> 
> JF
> 
> 
>> On Dec 11, 2018, at 3:18 PM, Joe Bialek via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> 
>> We went with zero initialization for production builds in Visual Studio because we anticipate zero being the fastest, smallest code gen, and safest value for the following cases:
>> Pointer: A non-zero pointer could potentially point at valid code. On Windows x64, the first 64kb of the virtual address space is guaranteed to not be mappable and the last 512GB of the virtual address space (today) has the space guarantee. So null pointer or near-null pointer dereferences are denial of service at worst.
>> Size: If you forget to set a size as an out parameter we’d rather it’s set to 0 so you don’t index out of bounds.
>> Index: Same as size.
>>  
>> We are using this mitigation to downgrade vulnerabilities from remote code execution, elevation of privilege, or information disclosure, down to denial of service at worst.
>>  
>> Assuming the denial of service isn’t for a scenario where it matters (there are many types of DOS we don’t really worry about for security), we will not bother servicing the vulnerabilities down-level and instead only fix them for new code.
>>  
>> It is helpful for us to have deterministic behavior to help accomplish this goal. If an engineer knows a value is always set to 0, they can quickly determine if the bug report is actually exploitable or not. If it is initialized to a random or compile time constant value, we’ll need to do more work to determine if the bug is exploitable.
>>  
>> We are not zero initializing CHK builds to prevent folks from taking a dependency on this feature.
>>  
>>  
>> Joe
>>  
>> From: Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> 
>> Sent: Tuesday, December 11, 2018 3:04 PM
>> To: Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>>; Jann Horn <jannh at google.com <mailto:jannh at google.com>>
>> Cc: David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>; Richard Smith <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>>; Clang Dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>; Joe Bialek <jobialek at microsoft.com <mailto:jobialek at microsoft.com>>
>> Subject: Re: [cfe-dev] [RFC] automatic variable initialization
>>  
>> One more dimension in this discussion (zero-init vs pattern-init) is what will security get from those options.
>> My second-hand knowledge here suggests that zero-init may have *on average* better security guarantees than non-zero init.
>> For example if the uninitialized data is interpreted as a length of something, it better be zero than any large number or, worse, a negative signed number. 
>> +Jann Horn <mailto:jannh at google.com> and Joe Bialek who have first-hand knowledge here. 
>>  
>> --kcc 
>>  
>>  
>>  
>>  
>> On Thu, Dec 6, 2018 at 3:58 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>  
>> 
>> On Thu, Dec 6, 2018 at 3:55 PM Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>> wrote:
>>  
>> 
>> On Thu, Dec 6, 2018 at 3:43 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>  
>> 
>> On Thu, Dec 6, 2018 at 3:03 PM Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>> wrote:
>>  
>> 
>> On Thu, Dec 6, 2018 at 1:01 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>>  
>> 
>> On Wed, Dec 5, 2018 at 6:07 PM Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> I have the same question as David: I don't understand why this isn't just an experimental build option disabled by default? (Google, Apple, etc. can just enable it in their build script, no need to patch the source).
>>  
>> I will need to rebuild half a dozen compiler binaries to do the measurements I need.
>> This is going to double the cost of the effort for me because it adds too many extra moving pieces. 
>>  
>> Not sure I follow, you have to build the compiler anyway to get it after the code is patched?
>> The only thing you would have to do is one CL to enable the build flag (that expose the command line flag) inside your codebase and then you get your toolchain as usual?
>>  
>> Once the JF's patch is in, I'll have it in most production compilers I care about in 2-6 weeks, w/o any effort on my side. 
>> None of those builders (easily) support adding custom patches, and even a build flag is going to be very non-trivial. 
>> I can deal with one or two of those builds relatively easily, but not with the rest. 
>>  
>> What builders are you referring to? So far I was assuming we were talking about your internal infrastructure that produces your production compilers.
>>  
>> I am talking about our internal infra. 
>> But we have multiple independent builders from multiple independent compiler users (e.g. Chrome and Android are using different compiler builds). 
>>  
>>  
>>  
>>  
>>  
>>  
>> -- 
>> Mehdi
>>  
>>  
>>  
>>  
>>  
>> The only argument I read in the thread was about allowing people to make performance measurement without rebuilding the compiler, but I have a hard time reconciliation this with the fact that we're talking about not shipping this before performing the actual measurements?
>>  
>> I expect that anyone that cares enough about the perf impact of this to influence the development of the feature in clang should already be rebuilding their compiler today. 
>>  
>> -- 
>> Mehdi
>>  
>>  
>> On Mon, Dec 3, 2018 at 5:58 PM David Blaikie via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> Just out of curiosity - what's the major difference between a build-time off-by-default feature and a build-time on-by-default-but-off-in-release branches feature? If we're only targeting groups that build/release the compiler themselves, then they're likely able to opt-in to a build-time feature easily enough, I'd think? & then there's no need to make our releases different from day-to-day builds?
>> 
>> But sounds like folks are in general agreement of a way forward, so I don't want to disrupt/delay that.
>> 
>> On Wed, Nov 28, 2018 at 11:14 PM Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote:
>> Suggested compromise technique to at least get an initial set of numbers:
>>  
>> 1) Require a special, long, ugly flag name.
>> 2) Make it a CC1 flag, requiring -Xclang ... to use.
>> 3) Emit a warning by default (that cannot be suppressed with a -Wno-... flag) when this flag is enabled.
>> 4) Commit to never including this flag in any upstream release. Either we remove it before the next release branches or we revert it on the branch.
>>  
>> Most of the folks we're hoping to get performance data with are willing to use a not-yet-released build of Clang. They won't have to actually patch it in any way. They will have strong reminders to not deploy this in any way due to the warning.
>>  
>> Thoughts?
>>  
>> On Wed, Nov 28, 2018 at 4:34 PM Kostya Serebryany via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> On Wed, Nov 28, 2018 at 3:28 PM David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>  
>> 
>> On Wed, Nov 28, 2018 at 3:17 PM Kostya Serebryany <kcc at google.com <mailto:kcc at google.com>> wrote:
>> 
>> Seems easier to me to separate the two pieces - move ahead with the non-zero options, and separate the discussion on the zero option. You can present performance numbers from what you can measure without shipping a compiler with the feature - and if those numbers are sufficiently compelling compared to the risks of slicing the language, then perhaps we go that way.
>>  
>> This approach will significantly impair my ability to do the measurements I need. 
>> 
>> I'm aware waht I'm proposing would make it more difficult for some people to take measurements - that's a tradeoff to be sure - one where I err in this direction.
>> 
>> Specifically for Google though - would it be that difficult for Google to opt-in to a certain build configuration of LLVM?
>>  
>> Absolutely yes. 
>> Google is not just a single monolithic code base <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdelivery.acm.org%2F10.1145%2F2860000%2F2854146%2Fp78-potvin.pdf%3Fip%3D104.133.8.94%26id%3D2854146%26acc%3DOA%26key%3D4D4702B0C3E38B35%252E4D4702B0C3E38B35%252E4D4702B0C3E38B35%252E5945DC2EABF3343C%26__acm__%3D1543446999_3aadcd36f657e2297430c38bee93f16c&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510848537&sdata=dd4I1xQ5qfawzLwsnBM3iqsypqZ%2BlFswj5IgKa%2FLHbw%3D&reserved=0>. 
>> 
>> Couldn't access this link ("An error occurred while processing your request" - but yeah, I understand there's a bunch of different pieces of Google beyond the "stuff that runs in data centers" piece we mostly support.
>>  
>> Besides that monolithic thing, we have Android, Chrome, ChromeOS, Fuchsia, and a bazillion of smaller efforts that use their own toolchains. 
>> 
>> Still, most/all of these build their own compilers, I think? But yeah, that adds an opt-in overhead to each project, for sure.
>>  
>> In some cases the most reliable and complete way of measuring performance changes is to submit the changes to revision control, 
>> and let the performance bots shew it for a couple of days. That's how we iterated with the LLVM's CFI in Chrome. 
>> We will also need to work with the upstream Linux kernel -- it's hard enough for them to use clang and a modified clang will cost us much more effort. 
>> 
>> Yeah, I can imagine that one's a bit trickier - how's performance evaluation of the kernel done?
>>  
>> I don't think anyone knows that. :-|
>> And requiring a compiler patch will shift the problem from "hard" to "I'd better do something else". 
>>  
>> (though, again, I imagine a fair amount of progress could be made without the zero-init feature - perhaps enough to say "hey, here are all the places we have run tests & seen the performance tradeoff is worthwhile for us (& possibly that it's close to the zero-init case, but that's sort of orthogonal, imho - that it's worthwhile is the main thing) - perhaps other folks would be willing to test it (non-zero init) & see if it's worthwhile to them - and if it isn't/they're interested in more performance, maybe all that evidence we can gain from the places where it's easy for us to rebuild compilers, etc, would make it interesting enough to motivate someone to do build the kernel with a custom compiler & do some performance measurements, etc... 
>> 
>> Sorry that was a bit rambly, anyway.
>> 
>> - Dave
>>  
>>  
>>  
>>  
>> We/Google do build the compiler from scratch, I assume we pick the configuration options we build with & some of them probably aren't the defaults for a release build of LLVM. So if it was important that Google's production compiler had these features enabled (rather than building a test compiler for running some experiments), that doesn't seem (at least to me, at this moment) especially prohibitive, is it?
>>  
>> _______________________________________________
>> 
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510858549&sdata=J1qIBl35xJOU4LXt2pmJKrl7Jo3RzuG04e7mWmXdULo%3D&reserved=0>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510868553&sdata=wC0zWaGYC6YVWA3W7X3PEUjbePjhZdOoKVy5vzD6y2I%3D&reserved=0>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=02%7C01%7Cjobialek%40microsoft.com%7C533e0fe1444d4b5a3a6b08d65fbcf479%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636801662510868553&sdata=wC0zWaGYC6YVWA3W7X3PEUjbePjhZdOoKVy5vzD6y2I%3D&reserved=0>_______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190116/0496b099/attachment.html>