[all-commits] [llvm/llvm-project] b4a17b: [DependenceAnalysis] Extending SIV to handle fusab...

Aiden Grossman via All-commits all-commits at lists.llvm.org
Fri Sep 19 21:43:42 PDT 2025


  Branch: refs/heads/users/boomanaiden154/main.clang-enable-lit-internal-shell-by-default
  Home:   https://github.com/llvm/llvm-project
  Commit: b4a17b13b7327e583fb16384004155508f31a09d
      https://github.com/llvm/llvm-project/commit/b4a17b13b7327e583fb16384004155508f31a09d
  Author: Alireza Torabian <alireza.torabian at huawei.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/include/llvm/Analysis/DependenceAnalysis.h
    M llvm/lib/Analysis/DependenceAnalysis.cpp
    A llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll

  Log Message:
  -----------
  [DependenceAnalysis] Extending SIV to handle fusable loops (#128782)

When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.


  Commit: 4bc9d29fbaa7f0d0a3b522e1e085e228d5d40d76
      https://github.com/llvm/llvm-project/commit/4bc9d29fbaa7f0d0a3b522e1e085e228d5d40d76
  Author: Chengjun <chengjunp at Nvidia.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/lib/Transforms/Scalar/SROA.cpp
    A llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
    A llvm/test/Transforms/SROA/vector-promotion-via-tree-structure-merge.ll

  Log Message:
  -----------
  [SROA] Use tree-structure merge to remove alloca (#152793)

This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.

The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:

- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain

This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.

This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.

Key features:

- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization

Example transformation:

```
// Before: (stores do not have to be in order) 
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca       ; offset 0-1
store <2 x float> %val2, ptr %alloca+16    ; offset 4-5  
store <2 x float> %val1, ptr %alloca+8     ; offset 2-3
store <2 x float> %val3, ptr %alloca+24    ; offset 6-7
%result = load <8 x float>, ptr %alloca

// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>  
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```

Benefits:

- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns

For some large cases, the compile time can be reduced from about 60s to
less than 3s.

---------

Co-authored-by: chengjunp <chengjunp at nividia.com>


  Commit: f437309e28daaf7443cb200d789a9304aac250d1
      https://github.com/llvm/llvm-project/commit/f437309e28daaf7443cb200d789a9304aac250d1
  Author: Craig Topper <craig.topper at sifive.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp

  Log Message:
  -----------
  [RISCV] Update comments in RISCVMatInt to reflect we don't always use ADDIW after LUI now. NFC (#159829)

The simm32 base case only uses lui+addiw when necessary after
3d2650bdeb8409563d917d8eef70b906323524ef

The worst case 8 instruction sequence doesn't leave a full 32 bits for
the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So
we will never generate LUI+ADDIW in the worst case sequence.


  Commit: c91fa95fc7426215817c3d20564558a310d587e2
      https://github.com/llvm/llvm-project/commit/c91fa95fc7426215817c3d20564558a310d587e2
  Author: Aiden Grossman <aidengrossman at google.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/lib/Transforms/IPO/SampleProfile.cpp

  Log Message:
  -----------
  [SampleProfile] Always use FAM to get ORE

The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.

Reviewers: aeubanks, efriedma-quic, wlei-llvm

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/159858


  Commit: 4a9fdda9882da8f054009c249c4bb0ef18f6f21a
      https://github.com/llvm/llvm-project/commit/4a9fdda9882da8f054009c249c4bb0ef18f6f21a
  Author: Roman Belenov <rbelenov at gmail.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/docs/CommandGuide/llvm-mca.rst
    M llvm/include/llvm/MCA/CustomBehaviour.h
    M llvm/include/llvm/MCA/InstrBuilder.h
    M llvm/lib/MCA/CustomBehaviour.cpp
    M llvm/lib/MCA/InstrBuilder.cpp
    A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
    A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
    M llvm/tools/llvm-mca/llvm-mca.cpp
    M llvm/unittests/tools/llvm-mca/MCATestBase.cpp
    M llvm/unittests/tools/llvm-mca/MCATestBase.h
    M llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp

  Log Message:
  -----------
  [MCA] Enable customization of individual instructions (#155420)

Currently MCA takes instruction properties from scheduling model.
However, some instructions may execute differently depending on external
factors - for example, latency of memory instructions may vary
differently depending on whether the load comes from L1 cache, L2 or
DRAM. While MCA as a static analysis tool cannot model such differences
(and currently takes some static decision, e.g. all memory ops are
treated as L1 accesses), it makes sense to allow manual modification of
instruction properties to model different behavior (e.g. sensitivity of
code performance to cache misses in particular load instruction). This
patch addresses this need.

The library modification is intentionally generic - arbitrary
modifications to InstrDesc are allowed. The tool support is currently
limited to changing instruction latencies (single number applies to all
output arguments and MaxLatency) via coments in the input assembler
code; the format is the like this:

add (%eax), eax // LLVM-MCA-LATENCY:100

Users of MCA library can already make additional customizations; command
line tool can be extended in the future.

Note that InstructionView currently shows per-instruction information
according to scheduling model and is not affected by this change.

See https://github.com/llvm/llvm-project/issues/133429 for additional
clarifications (including explanation why existing customization
mechanisms do not provide required functionality)

---------

Co-authored-by: Min-Yih Hsu <min at myhsu.dev>


  Commit: 9542d0a0c661be92db950514b5dc9c5ea6d953af
      https://github.com/llvm/llvm-project/commit/9542d0a0c661be92db950514b5dc9c5ea6d953af
  Author: Joseph Huber <huberjn at outlook.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M llvm/cmake/modules/HandleLLVMOptions.cmake

  Log Message:
  -----------
  [libc] Fix libc build on NVPTX using wrong linker flag

Summary:
Ugly hacks abound, we can't actually test linker flags correctly
generically because not everyone has `nvlink` as a binary on their
machine which would then result in every single flag being unsupported.
This is the only 'linker flag' check we have, so just hard code it off.


  Commit: a38794ff3d47588cb226881eb048cb2333962ab9
      https://github.com/llvm/llvm-project/commit/a38794ff3d47588cb226881eb048cb2333962ab9
  Author: Aiden Grossman <aidengrossman at google.com>
  Date:   2025-09-19 (Fri, 19 Sep 2025)

  Changed paths:
    M clang/test/ClangScanDeps/pr61006.cppm
    M clang/test/ClangScanDeps/resource_directory.c
    M clang/test/Driver/env.c
    M clang/test/Driver/program-path-priority.c
    M clang/test/Modules/relative-resource-dir.m

  Log Message:
  -----------
  [Clang] Rewrite tests using subshells to set env variables

Now that we have the %readfile substitution, we can rewrite these tests
that were using env variable subshells to write the output of the
command into a file and then load it where it is needed using readfile.

This does involve one invocation of bash so that we are using the system
env binary, which does support redirection into a tool like grep. We
already do this in one LLVM test. I'm not happy about needing that, but
the more principled way to solve it involves reworking how in-process
builtins work within lit. That is something we want to do eventually,
but not something that I think should block this patch.

Reviewers: cmtice, petrhosek, ilovepi

Reviewed By: cmtice, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/158446


  Commit: 9c246a9c12af9e21ca912ae94ec5816cb684a924
      https://github.com/llvm/llvm-project/commit/9c246a9c12af9e21ca912ae94ec5816cb684a924
  Author: Aiden Grossman <aidengrossman at google.com>
  Date:   2025-09-20 (Sat, 20 Sep 2025)

  Changed paths:
    M llvm/cmake/modules/HandleLLVMOptions.cmake
    M llvm/docs/CommandGuide/llvm-mca.rst
    M llvm/include/llvm/Analysis/DependenceAnalysis.h
    M llvm/include/llvm/MCA/CustomBehaviour.h
    M llvm/include/llvm/MCA/InstrBuilder.h
    M llvm/lib/Analysis/DependenceAnalysis.cpp
    M llvm/lib/MCA/CustomBehaviour.cpp
    M llvm/lib/MCA/InstrBuilder.cpp
    M llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    M llvm/lib/Transforms/Scalar/SROA.cpp
    A llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll
    A llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
    A llvm/test/Transforms/SROA/vector-promotion-via-tree-structure-merge.ll
    A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
    A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
    M llvm/tools/llvm-mca/llvm-mca.cpp
    M llvm/unittests/tools/llvm-mca/MCATestBase.cpp
    M llvm/unittests/tools/llvm-mca/MCATestBase.h
    M llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp

  Log Message:
  -----------
  [𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6

[skip ci]


Compare: https://github.com/llvm/llvm-project/compare/93f6691db215...9c246a9c12af

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list