[llvm-dev] LLVM LNT floating point performance tests on X86 - using the llvm-test-suite benchmarks

Wed May 19 08:30:30 PDT 2021

On 5/19/21 9:18 AM, Blower, Melanie I wrote:
> Thank you, I have more questions.
>
> I am using a shared Linux system (Intel(R) Xeon(R) Platinum 8260M CPU @ 2.40GHz) to build and run the llvm-test-suite.  Do I need to execute the tests on a quiescent system? I tried running a "null check" i.e. execute and collect results from llvm-lit run using the same set of test executables and the differences between the two runs (ideally would be zero since it's the same test executable) ranged from +14% to -18%.
>
> What is the acceptable tolerance?

I'm not following what the "results" is here.

> I work in clang not the backend optimization so I am not familiar with analysis techniques to understand what optimization transformations occurred due to my patch. Do you have any tips about that?
It doesn't necessarily matter. If you want to know without any other 
information, you could compare the
outputs of -mllvm -print-all w/ and w/o your patch. I don't think it is 
strictly necessary if the tests
are not impacted too much.

> Using the real test (unpatched compiler versus patched compiler), I compared the assembly for symm.test since it’s SingleSource, compiling with the 2 different compilers
>
> test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test     10.88      9.93  -8.7%
>
> Here’s the only difference in the .s file, seems unlikely that this would account for 8% difference in time.
>
>
> .LBB6_21:                               #   in Loop: Header=BB6_18 Depth=2
>          leaq    (%r12,%r10), %rdx
>          movsd   (%rdx,%rax,8), %xmm3            # xmm3 = mem[0],zero
>          mulsd   %xmm1, %xmm3
>          movsd   (%r9), %xmm4                    # xmm4 = mem[0],zero
>          mulsd   %xmm0, %xmm4
>          mulsd   (%rsi), %xmm4
>          addsd   %xmm3, %xmm4
>
> With patch for contract changes:
> .LBB6_21:                               #   in Loop: Header=BB6_18 Depth=2
>          movsd   (%r9), %xmm3                    # xmm3 = mem[0],zero
>          mulsd   %xmm0, %xmm3
>          mulsd   (%rsi), %xmm3
>          leaq    (%r12,%r10), %rdx
>          movsd   (%rdx,%rax,8), %xmm4            # xmm4 = mem[0],zero
>          mulsd   %xmm1, %xmm4
>          addsd   %xmm3, %xmm4
>
> The difference for test flops-5 was 25% but the code differences are bigger. I can try dump-after-all as first step.

I'm not sure we need to look at this right now.

> I'm nowhere near "generating new hash values" but won't the hash value be relative to the target microarchitecture? So if my system is different arch than bot, the hash value I compute here wouldn't compare equal to the bot hash?
>
> This is how I tested. Is the build line correct for this purpose (caches/O3.cmake), should I use different options when creating the test executables?
>   
> git clone https://github.com/llvm/llvm-test-suite.git test-suite
> cmake -DCMAKE_C_COMPILER=/iusers/sandbox/llorg-ContractOn/deploy/linux_prod/bin/clang \
>     -DTEST_SUITE_BENCHMARKING_ONLY=true -DTEST_SUITE_RUN_BENCHMARKS=true \
>     -C/iusers/test-suite/cmake/caches/O3.cmake \
>       /iusers/test-suite
> make
> llvm-lit -v -j 1 -o results.json .
> (Repeat in a different build directory using the unmodified compiler)
> python3 test-suite/utils/compare.py -f --filter-short  build-llorg-default/results.json build-llorg-Contract/results.json >& my-result.txt
>
I'm a little confused what you are doing, trying to do. I was expecting you
run the symm executable compiled w/ and w/o your patch, then look at the
numbers that are printed at the end. So compare the program results, but not
the compile or execution time. If the results are pretty much equivalent,
we can use the results w/ patch to create a new hash file. If not, we need
to investigate why. Does that make sense?

~ Johannes

>
>> -----Original Message-----
>> From: Johannes Doerfert <johannesdoerfert at gmail.com>
>> Sent: Tuesday, May 18, 2021 8:25 PM
>> To: Blower, Melanie I <melanie.blower at intel.com>; hal.finkle.llvm at gmail.com;
>> spatel+llvm at rotateright.com; llvm-dev <llvm-dev at lists.llvm.org>;
>> florian_hahn at apple.com
>> Subject: Re: LLVM LNT floating point performance tests on X86 - using the llvm-
>> test-suite benchmarks
>>
>> You can run the LNT tests locally and I would assume the tests to be impacted
>> (on X86).
>>
>> The Polybench benchmarks, probably some others, have hased result files.
>> Thus, any change
>> to the output is flagged regardless how minor. I'd run it without and with this
>> patch and compare the results. If they are in the expected tolerance I'd recreate
>> the hash files for them and create a dependent commit for the LLVM test suite.
>>
>> Does that make sense?
>>
>> ~ Johannes
>>
>>
>> On 5/18/21 3:32 PM, Blower, Melanie I wrote:
>>> Hello.
>>> I have a patch to commit to community
>> https://reviews.llvm.org/D74436?id=282577 that changes command line
>> settings for floating point.  When I committed it previously, it was ultimately
>> rolled back due to bot failures with LNT.
>>> Looking for suggestions on how to use the llvm-test-suite benchmarks to
>> analyze this issue so I can commit this change.
>>> We think the key difference in the tests that regressed when I tried to commit
>> the change was caused by differences in unrolling decisions when the fmuladd
>> intrinsic was present.
>>> As far as I can tell, the LNT bots aren't currently running on any x86 systems,
>> so I have no idea what settings the bots used when they were running. I'm really
>> not sure how to proceed.
>>> It seems to me that FMA should give better performance on systems that
>> support it on any non-trivial benchmark.
>>> Thanks!