[PATCH] D148855: [SLP]Improve tryToGatherExtractElements by using per-register analysis.

Thu Nov 2 15:00:31 PDT 2023

ABataev added a comment.

In D148855#4656048 <https://reviews.llvm.org/D148855#4656048>, @mstorsjo wrote:

> In D148855#4656008 <https://reviews.llvm.org/D148855#4656008>, @ABataev wrote:
>
>> In D148855#4655968 <https://reviews.llvm.org/D148855#4655968>, @mstorsjo wrote:
>>
>>> This seems to have caused a misoptimization in ffmpeg for aarch64.
>>>
>>> To reproduce, you can follow these steps, on aarch64 Linux:
>>>
>>>   $ git clone https://github.com/ffmpeg/ffmpeg
>>>   $ mkdir ffmpeg-build
>>>   $ cd ffmpeg-build
>>>   $ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
>>>   $ make fate-rsync
>>>   $ make -j$(nproc) fate-vp9-00-quantizer-18
>>>
>>> The misoptimized object file is `libavcodec/vp9dsp_8bpp.o`.
>>>
>>> The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with `clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o`.
>>>
>>> Can you look into this, and possibly revert if fixing takes some time?
>>
>> Compared the output, llvm ir output actually becomes smaller. I cannot do perf run. Looks like the lowering does not do the good job, need to create a ticket against AARCH64 codegen to improve it
>
> I'm not saying the code became slower - I'm saying the code no longer produces the right result.
>
> I'll push a revert for now, to unbreak things.

Ah, ok, now I see. Ok, go ahead and revert it, I'll investigate it tomorrow.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148855/new/

https://reviews.llvm.org/D148855