[llvm-dev] [RFC] Control Flow Sensitive AutoFDO (FS-AFDO)

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 20 09:34:08 PST 2020


On Thu, Nov 19, 2020 at 11:29 PM Rong Xu <xur at google.com> wrote:

> Hi Wenlei, thanks for comments and questions! Please see my comments
> inline.
>
> On Wed, Nov 18, 2020 at 4:34 PM Wenlei He <wenlei at fb.com> wrote:
>
>> Hi Rong,
>>
>>
>>
>> This is a very interesting proposal. We've also observed profile quality
>> degradation from CFG destructive pass like loop rotate, and I can see how
>> this proposal would help improve quality of profile that drives later
>> optimization passes in the pipeline. I have a few questions.
>>
>>
>>
>>    - How does this affect today's AutoFDO? Specifically, can users
>>    upgrade compiler with FS-AutoFDO support but without refreshing their
>>    profile? I think it's important to make new improvement like this as
>>    opt-in, so other users of AutoFDO can choose if and when they want to make
>>    the switch from AutoFDO to FS-AutoFDO. With the proposed changes to
>>    discriminator encoding, sounds like we are going to eliminate duplication
>>    factor etc. altogether. In that case, multiple FS-AutoFDO sample profile
>>    loading would be required in order to not regress from today's AutoFDO due
>>    to lack of duplication factor. Is that correct? If so, this change as is
>>    can break backward compatibility with today's AutoFDO profile. It’d be
>>    great to handle discriminator in a way that compatible with today’s
>>    AutoFDO.
>>
>> FS-AFDO has a new discriminator encoding scheme.  You are right that it's
> not backward compatible with current AFDO profiles. The conflict is mainly
> from the duplication factors -- as it will be completely replaced.
> The only possible regression I can think of is using current AFDO profile
> as an FS-AFDO profile. Due to the difference in encoding, some
> discriminators might be handled incorrectly.
> I think we can add a flag to the AFDO header to let the compiler know if
> this is the new FS-AFDO profile.
>
> We did have some discussions on the deployment at Google -- it will
> require two stages: the first stage enables the new discriminator scheme
> but still reads current AFDO profile. In the second stage, we will use the
> FS-AFDO profile and enable FS-AFDO passes.
>
>
>>
>>    - Profile loading usually happens early in the pipeline in order to
>>    avoid mismatch between profiling and optimization. In the case of
>>    FS-AutoFDO, some mismatch may be inevitable for the later FS profile
>>    loading. In practice, how significant is the mismatch in later profile
>>    loading? Have you tried to see by how much using multiple iterations (like
>>    CSPGO) can further help profile quality and improve performance? I
>>    understand this is not practical for production use, but could give us data
>>    points as to what's left on the table due to mismatch of later FS profile
>>    loading.
>>
>>  I did measure the coverage (The same mechanism used in SampleLoader).
> For our non-triel programs, we can use about 74% of the samples (for the
> FS-AFDO loader before machine block placement).
> I also tried the iterative FS-AFDO.  Unfortunately iterative AFDO alone
> shows a non-trival regression which is larger than the gain for FS-AFDO.  I
> did that experiment a while ago. I might need to revisit it later.
>
>>
Was the experiment (iterative FS-AFDO) with one round of FS profile loading
(for MBP) or multiple?  Like AFDO, transformation decision differences like
ifcvt can affect profile quality.




>
>>
>>    - In your performance experiment, where is the extra
>>    FSProfileSampleLoader in the pipeline, right before machine block placement
>>    or somewhere else? Have you tried adding more than one
>>    FSProfileSampleLoader and is there extra perf gain for more than one
>>    FSProfileSampleLoader?
>>
>> The main performance is from the FSProfileSampleLoader before machine
> block placement. I did add one more loader after block placement (and a new
> BranchFolder pass to do the tail merge). But that did not give me visible
> performance.
> That said, I think there is still lots of tuning work to do for better
> performance. I will definitely try more experimental loader passes.
>
>>
>>    - For the final discriminator assignment, is this because we take MAX
>>    of samples on addresses from the same location? So if there's any late code
>>    duplication after last discriminator assignment, we need a final
>>    discriminator assignment to turn MAX into SUM for earlier FS profile?
>>
>> Yes. That's the main consideration. create_llvm_profile still use MAX for
> the same locations. The final discriminator assignment is to assign
> different discrimnatiro for each duplicated code so their sample can be
> summed.
>
>>
There is taildup happening in MBP, so there are branch clonings.




>
>>
>>    - While changing discriminator encoding, have you considered encode
>>    block Id into discriminator so AutoFDO can be CFG aware? This is something
>>    Wei brought up earlier in the context of CSSPGO, and we didn't go that
>>    route partly because it's hard to do it without breaking compatibility.
>>
>> Discriminator only has 32 bits. It's hard to inject more information. In
> the early stage of this work, we did consider using BB labels to encode
> more the flow information. This would use Sri's work on BasicBlock
> Sections.  But that required significant change than the discriminator
> based method.
>
>>
>>
>> This would be complementary to the context-sensitive SPGO we proposed
>> earlier - would be nice to make PGO/FDO both context-sensitive and
>> flow-sensitive. I think the flow-sensitive part could also integrate with
>> pseudo-probe, essentially we can append FS discriminator later multiple
>> times the same way as you proposed here, except that the "line" part would
>> be "probeId".
>>
> Yes. I think this idea could be used in your pseudo-probe work. The good
> thing about pseudo-probe is that it does not have a real instruction and
> the compiler can afford multiple round probe insertions.
>
>>
>>
>> +Hongtao as well.
>>
>>
>>
>> Thanks,
>>
>> Wenlei
>>
>>
>>
>> *From: *llvm-dev <llvm-dev-bounces at lists.llvm.org>
>> *Date: *Tuesday, November 17, 2020 at 9:55 AM
>> *To: *llvm-dev <llvm-dev at lists.llvm.org>
>> *Cc: *David Li <davidxl at google.com>
>> *Subject: *[llvm-dev] [RFC] Control Flow Sensitive AutoFDO (FS-AFDO)
>>
>> Hi all,
>>
>> Here I include an RFC for control flow sensitive AutoFDO (FS-AFDO). This
>> is a joint work with David Li. Questions and feedback are welcome.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Rong
>>
>>
>>
>> =============
>>
>> [RFC] Control Flow Sensitive AutoFDO (FS-AFDO)
>>
>>
>>
>> 1. Motivation
>>
>>
>>
>> AFDO profile is derived from PMU samples from running an earlier build
>> binary. PMU samples are indexed by the IP addresses. An offline tool uses
>> the debug line number to map PMU samples to the source line. The accuracy
>> of debug information is critical for AFDO performance. Thanks to the
>> continuous efforts from the community, we have fixed many debug info
>> related issues that affect AFDO.
>>
>>
>>
>> One of the remaining areas that affects the profile accuracy is the
>> handling of duplicated code. Aggressive compiler optimizations can change
>> functions’ control flow significantly. The transformation can duplicate
>> branches in multiple places and new branches that do not exist in the
>> original source code might also be introduced. All these branches, even
>> though originated from the same source line, may have very different
>> behavior -- which is called (control) flow sensitivity.
>>
>>
>>
>> Current AFDO handles the duplication in a pretty coarse grain way. When
>> AFDO sees samples with different IP addresses that map to the same source
>> line, it uses a max function to get the maximum number of samples and
>> attach to that line. From this prospective, current AFDO is not control
>> flow sensitive. Consider the following example:
>>
>>   // Original code:
>>
>>  for()
>>
>>     Stmt1; // line 10
>>
>>
>>
>>   // After a versioning optimization
>>
>>   for() {
>>
>>     if (cond1)
>>
>>        stmt1;  // line 10, 100 samples, address: C1
>>
>>     else if (cond2)
>>
>>        stmt1;  // line 10, 80 samples, address: C2
>>
>>     else
>>
>>        stmt1;  // line 10, 20 samples, address: C3
>>
>>   }
>>
>> In the final binary, stmt1 is cloned 3 times by the versioning
>> optimization. The address C1, gets 100 samples, the address C2 gets 80
>> samples, and the address C3 gets 20 samples. When converting Perf samples
>> to AFDO profile, a max function is used for these samples as they have the
>> same source line. In the AFDO profile, we will have a sample entry of <10,
>> 100>, which means we have a  sample value of 100 for line 10 of this
>> function.  We will not have branch information for cond1, cond2, as they do
>> not exist in the sample loading phase. The profile count for stmt1 is also
>> off the actual value (200 = 100 +80 + 20) by 50%.
>>
>>
>>
>> In this work, we propose a method that maintains more precise AFDO
>> profile counts for the duplicated control flow.
>>
>>
>>
>> 2. FS-AFDO Designs
>>
>>
>>
>> In this work, we don't aim to keep track of the transformations and make
>> the flow sensitivity available across the compilation pipelines. Rather, we
>> identify a set of target optimizations that can potentially benefit most
>> from a more precise profile. For each of such optimizations, we use a
>> helper pass to recompute the flow sensitive profile just before this pass.
>> When loading this profile, we will recompute the branch probability and
>> hopefully improve the optimization quality for that target pass.
>>
>>
>>
>> The flow sensitivity in the profile is through a hierarchical
>> discriminator scheme that can better preserve flow information in the Perf
>> profile.
>>
>>
>>
>> 2.1. Current Use of Discriminators
>>
>> The AFDO profile contains a discriminator field whose goal is to
>> differentiate statements with the same line number in  the source code
>> (largely from MACRO expansions).
>>
>>
>>
>> Discriminators were later expanded to get more accurate profile counts.
>> It is expanded to 3 components which can partially address the issue. The
>> main objective of that work was to get the correct profile count for the
>> unrolled (or vectorized) BBs.  The discriminator is divided into 3
>> components:
>>
>>  * Base discriminator, assigned by AddDiscriminators
>>
>>  * Duplication factor: used in loop unroll and loop vectorization (when
>> cloning the loop body). This factor will be multiplied by sample counts in
>> create_llvm_prof tool to get the count value before duplication.
>>
>> * Copy Identifier: reserved by not currently used.
>>
>>
>>
>> Duplication factor was explicitly assigned in loop unroll and loop
>> vectorization. Instruction cloning will copy the discriminator. All the
>> cloned instruction instances, other than inlining, loop unroll and
>> vectorization, have the same discriminator. In the create_llvm_prof tool,
>> when assigning the sample count value for <offset_to_func_head,
>> discriminator>, the max sample value among all the clones will be the
>> assigned value.
>>
>>
>>
>> 2.2 Hierarchical discriminators
>>
>> In FS-AFDO, discriminator assigning happens multiple times --
>> conceptually, one assignment for each optimization that can benefit from
>> more precise FS profile data (let’s call it a target optimization pass).
>> The discriminator is divided into multiple sections. Earlier passes have
>> the lower bits and each instance of assigning access the bits in their
>> sections only.
>>
>>
>>
>> For example, there could be 3 rounds of discriminator assigning: the base
>> discriminator which corresponds to existing discriminator assigning,
>> discriminator for pass N, and discriminator for pass M.  Let’s assume each
>> round uses 4 bits. If the instruction Inst1 is from BB0 and it’s copied to
>> BB1, it will have a discriminator 0x1. If BB1:Inst1 is copied to BB2:intr2
>> before Pass N, this instruction will have a new discriminator of 0x11
>> associated with pass N. Similarly, each copy of Inst1 before Pass M will
>> have its own discriminators using bit 8 to bit 12.
>>
>>
>>
>> When converting Perf samples to the AFDO profile, we will have the sample
>> count for each discriminator.  The AFDO profile loader reads all the
>> samples and masks out the bit used by later rounds. For example, when
>> loading the profile for the base discriminator, it only checks the first 4
>> bits. All the samples cloned from the same base discriminator will be
>> aggregated together. Profile loader for pass N will only check the first 8
>> bits of the discriminator and sum up all the samples with the same
>> discriminator bit 0 to bit 7.
>>
>>
>>
>> 2.3 Profile Mismatch
>>
>>
>>
>> The above describes the ideal situation for loading the profile -- all
>> the discriminators in the Perf binary match the discriminator in optimized
>> build. In reality, the compilation for the optimized binary, with a new
>> profile, might perform different transformations (from the compilation of
>> the Perf binary). It can alter the program control flow for the downstream
>> passes, which leads to a discriminator mismatch.
>>
>>
>>
>> If the mismatch happens, the profiling information will not be updated by
>> the SampleLoader to the affected BBs. They will inherit the profile
>> information (like branch probabilities) from the previous round of
>> SampleLoader.
>>
>>
>>
>> The FS component in the discriminators basically uses a sequential
>> number, just like in the base discriminator. But to handle potential
>> mismatch,  we also encode the total number of new discriminators. This
>> prevents the misplacement of profile counts to a wrong code clone. There
>> are limited bits for each FS discriminator component.  We choose to use a
>> hash value as the FS discriminator. The hash function takes the following
>> as the input:
>>
>>  * the sequential number of the current clone,
>>
>>  * the total number of the current clone.
>>
>>
>>
>> A second measure to deal with mismatch is to use the profile coverage. We
>> summarize the samples used by SampleLoad for each function, and compare the
>> total samples in the profile. A warning is printed out if the coverage
>> (ratio of used samples and total sample) is lower than a threshold value.
>> In which case, the user should rebuild the Perf binary and collect the Perf
>> profile again.
>>
>>
>>
>> 2.4 Example of pass pipeline
>>
>> Here is the one sample pass pipeline for AFDO + ThinLTO compilation.
>>
>>
>>
>> // Pass Pipeline:
>>
>> InsertBaseDiscrimator
>>
>> SampleProfileLoader in FE
>>
>> ...
>>
>> CM inline
>>
>> ...
>>
>> SampleProfileLoader in BE
>>
>> ...
>>
>> AddFSDiscriminator(kind_mbp)
>>
>> FSProfileLoader(kind_mbp)
>>
>> MachineBlockPlacement
>>
>> AddFSDiscriminator(kind_branchfolding)
>>
>> FSProfileLoader(kind_branchfolding)
>>
>> BranchFoldingPass
>>
>>>>
>> AddFSDiscriminator(kind_final)
>>
>>
>>
>> In the above pass pipeline. MachineBlockPlacement and BranchFoldingPass
>> are the two target optimization passes. MachineBlockPlacement can obviously
>> be benefited from more precise profile information. The tail merge in
>> BranchFoldPass can undo many unneeded tail duplications in
>> MachineBlockPlacment.
>>
>>
>>
>> Note that we have a FS discriminator assigning pass at the final of
>> compilation pipeline. The reason is to assign a different discriminator for
>> each cloned instruction in the final binary. By doing this, we can get a
>> more precise profile for earlier passes by summing the counter values from
>> different discriminators.
>>
>>
>>
>> 3. A running example
>>
>>
>>
>> The test program is shown as the following. We pay most attention to the
>> branch in line 5 and line 7 in function foo(). Inner loop at line 3 will be
>> fully unrolled after SampleProfileLoader. Outer loop at line 2 is intact by
>> the optimizer.
>>
>> __attribute__((noinline)) int bar(int i){
>>
>>   volatile int j;
>>
>>   j = i;
>>
>>  return j;
>>
>> }
>>
>>
>>
>> __attribute__((noinline)) void work(int i){
>>
>> ...
>>
>> }
>>
>>
>>
>> __attribute__((noinline)) void foo(){ // line 17 of unroll.c
>>
>>   int i, j;
>>
>>   for (j = 0; j < 48; j++)
>>
>>     for (i = 0; i < 4; i++) {
>>
>>       int ii = bar(i + j * 48);
>>
>>       if (ii % 2)                   // line 22 of unroll.c
>>
>>         work(ii * 2);
>>
>>       if (ii % 4)                   // line 24 of unroll.c
>>
>>         work(ii * 3);
>>
>>     }
>>
>> }
>>
>>
>>
>> Current AFDO will produce the following profile:
>>
>> foo:30387110:1606             void foo() {
>>
>>  0:   1606                      int i, j;
>>
>>  2.1: 83748                     for (j = 0; j < 48; j++) {
>>
>>  4:   351836 bar:84932              int ii = bar(i + j * 48);
>>
>>  5:   351836                        if (ii % 2)
>>
>>  6:   351836 work:79741               work(ii * 2)
>>
>>  7:   350420                        if (ii % 4)
>>
>>  8:   338340 work:75287                work(ii * 3)
>>
>>  10:  1302                     }
>>
>> Function foo() entry sample count is 1606. Line 2.1 is the outer loop
>> body with a sample count of 83748. Line 4 to line 8 is inner loop body
>> count. It’s scaled by an unrolled multiplication discriminator of 4. This
>> resulted in a higher value of sample count: it uses the Max of 4 clones and
>> multiplies by 4.
>>
>>
>>
>> Here is the FS AFDO profile. Basically it has a separated profile count
>> for each unrolled instructions.
>>
>>
>>
>> foo:8722583:1597                  void foo() {
>>
>>  0:       1597                      int i, j;
>>
>>  2.1:     83694                     for (j = 0; j < 48; j++)
>>
>>  4:       87205 bar:85542               int ii = bar(i + j * 48);
>>
>>  4.8448:  82042 bar:82155               int ii = bar(i + j * 48);
>>
>>  4.13312: 84795 bar:84802               int ii = bar(i + j * 48);
>>
>>  4.13568: 87820 bar:87718               int ii = bar(i + j * 48);
>>
>>  5:       87205                         if (ii % 2)
>>
>>  5.8448:  82042                         if (ii % 2)
>>
>>  5.13312: 79604                         if (ii %2)
>>
>>  5.13568: 87820                         if (ii %2)
>>
>>  6:       0                               work(ii * 2);
>>
>>  6.8448:  0                               work(ii * 2);
>>
>>  6.13312: 79604 work:79652                work(ii * 2);
>>
>>  6.13568: 87820 work:90500                work(ii * 2);
>>
>>  7:       87493                         if (ii %4)
>>
>>  7.8448:  84624                         if (ii % 4)
>>
>>  7.13312: 71942                         if (ii % 4)
>>
>>  7.13568: 83552                         if (ii % 4)
>>
>>  8:       0                               work(ii * 3);
>>
>>  8.8448:  84624 work:87655                work(ii * 3);
>>
>>  8.13312: 71942 work:75137                work(ii *3);
>>
>>  8.13568: 83552 work:91223                work(ii *3);
>>
>>  10:      1304                     }
>>
>>
>>
>> Let’s look at line 6 and line 8. Each of which has 4 records. When we use
>> these records in SampleLoader, these 4 records will be combined into one
>> after masking out the bits: line 6 will have one record of
>>
>> 6: 167424 work: 170152
>>
>> Line 8 will have one record:
>>
>> 8: 240118 work: 254015
>>
>> Comparing the sample count in line 5 and 7, we can easily see that line 5
>> has ~50% taken and line 7 has ~75% taken. This is in contrast to the
>> current AFDO profile, where both lines have a 100% taken branch.
>>
>>
>>
>> When we load the above profile before the MachineBlockPlacement pass,
>> each record will maintain its own instance. We can see that at line 5, two
>> of the branches have ~0% taken probability, while the rest two have ~100%
>> taken probability. For line 7, one branch has 0% taken probability, while
>> the reset three has ~100% taken probability. We will reset the branch
>> probabilities as shown in the following messages:
>>
>> Set branch fs prob: MBB (1 -> 3): unroll.c:22:11 W=87820  0x00005f85 /
>> 0x80000000 = 0.00% --> 0x80000000 / 0x80000000 = 100.00%
>>
>> Set branch fs prob: MBB (1 -> 2): unroll.c:22:11 W=87820  0x7fffa07b /
>> 0x80000000 = 100.00% --> 0x00000000 / 0x80000000 = 0.00%
>>
>> Set branch fs prob: MBB (3 -> 5): unroll.c:24:11 W=87820  0x04a8dc00 /
>> 0x80000000 = 3.64% --> 0x80000000 / 0x80000000 = 100.00%
>>
>> Set branch fs prob: MBB (3 -> 4): unroll.c:24:11 W=87820  0x7b572400 /
>> 0x80000000 = 96.36% --> 0x00000000 / 0x80000000 = 0.00%
>>
>> Set branch fs prob: MBB (9 -> 11): unroll.c:22:11 W=87820  0x00005f85 /
>> 0x80000000 = 0.00% --> 0x80000000 / 0x80000000 = 100.00%
>>
>> Set branch fs prob: MBB (9 -> 10): unroll.c:22:11 W=87820  0x7fffa07b /
>> 0x80000000 = 100.00% --> 0x00000000 / 0x80000000 = 0.00%
>>
>> Set branch fs prob: MBB (15 -> 17): unroll.c:24:11 W=87820  0x04a8dc00 /
>> 0x80000000 = 3.64% --> 0x17248215 / 0x80000000 = 18.08%
>>
>> Set branch fs prob: MBB (15 -> 16): unroll.c:24:11 W=87820  0x7b572400 /
>> 0x80000000 = 96.36% --> 0x68db7deb / 0x80000000 = 81.92%
>>
>>
>>
>> 4. Implementation in LLVM
>>
>>
>>
>> 4.1 AFDO profile data format
>>
>> The AFDO profile format remains largely the same. Samples could be
>> splitted into multiple parts with different discriminator copy components.
>> SampleLoader will aggregate the samples dynamically when reading the AFDO
>> profile based on the discriminator masks. Current discriminator encoding
>> will be removed and related code in the compiler will be removed.
>>
>>
>>
>> The hierarchical discriminator in FS-AFDO captures most of the cases of
>> multiplication discriminator in currenting encoding scheme. As a matter of
>> fact, the resulting count values are usually more precise as each
>> individual discriminator may have different count values. There is one
>> exception: when the unrolling body has a single BB. FS-AFDO will have the
>> correct sample count after unrolling as all the instructions residing in
>> the same BB. In our prototype implementation, we did not find this issue to
>> have a visible performance impact.
>>
>>
>>
>> 4.2 Compiler changes
>>
>>
>>
>> 4.2.1 Add two new CodeGen passes:
>>
>> AddFSDiscriminatorsPass: adds flow sensitive discriminators.
>>
>> FSProfileLoaderPass: read FS profile and set branch probabilities.
>>
>> FSProfileLoadPass will reuse most of the IR level SampleProfileLoader
>> algorithm and heuristics, but we need to reimplement at Machine level. We
>> will not need to update branch metadata. We will change the branch
>> probabilities directly.
>>
>>
>>
>> 4.2.2 Enhance SampleProfileReader to handle FS discriminators.
>>
>> Add a new field of MaskedBitFrom to SampleProfileReader. This is the
>> lower mask bit being used for this pass instance. For each profile loader,
>> we dynamically compute samples based on bit masks -- this is done in
>> readImpl() method where the masked discriminators are used to aggregate the
>> samples.
>>
>>
>>
>> 4.3 AutoFDO tool changes
>>
>> The position type used in PositionCountMap and Callsite type is of type
>> uint32. It is divided into high 16 bits as file offset and lower 16 bits as
>> discriminator. This is probably OK for the current discriminators as they
>> are numbered sequentially.  But this would create too many clashes in FS
>> discriminators. We need to extend to uint64 for the position: a full 32-bit
>> for line number and full 32-bit for discriminators.
>>
>>
>>
>> 5. Experimental performance
>>
>> A prototype was implemented and a performance test was done on critical
>> applications in Google.
>>
>> Test binary:
>>
>> (1) build the first round binary with an existing AFDO profile with
>> AddFSDiscriminator enabled.
>>
>> (2) collect the PERF samples and convert them to the FS AFDO profile.
>>
>> (3) build an optimized binary using the FSAFDO profile from step (2).
>>
>>
>>
>> Base binary:
>>
>> (1) build the first round binary with an existing AFDO profile
>>
>> (2) collect the PERF samples and convert them to the AFDO profile.
>>
>> (3) build an optimized binary using profile from step (2).
>>
>>
>>
>> Run time performance:
>>
>> The test binary shows 0.9% to 1.3% performance improvement over the base
>> binary.
>>
>>
>>
>> Profile size:
>>
>> Profile size is modestly increased. Using the Google benchmark as the
>> example, the profile size for the text form increased by 4% to 7% while the
>> binary form profile size increased by 5% to 8%.
>>
>>
>>
>> Compiler time:
>>
>> Prototype implementation shows about 10% increase of build time (measued
>> by the elapsed time which includes two extra FSAddDiscrimniator passes and
>> one extra FSProfileSampleLoader pass). Profile readings are not shared b/w
>> FSProfileSampleLoader and current ProfileSampleLoader. We expect a smaller
>> compiler time increase with better data structures that share a profile
>> loader.
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201120/d6294c7b/attachment.html>


More information about the llvm-dev mailing list