[all-commits] [llvm/llvm-project] b4a17b: [DependenceAnalysis] Extending SIV to handle fusab...
Aiden Grossman via All-commits
all-commits at lists.llvm.org
Fri Sep 19 21:43:42 PDT 2025
Branch: refs/heads/users/boomanaiden154/main.clang-enable-lit-internal-shell-by-default
Home: https://github.com/llvm/llvm-project
Commit: b4a17b13b7327e583fb16384004155508f31a09d
https://github.com/llvm/llvm-project/commit/b4a17b13b7327e583fb16384004155508f31a09d
Author: Alireza Torabian <alireza.torabian at huawei.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/include/llvm/Analysis/DependenceAnalysis.h
M llvm/lib/Analysis/DependenceAnalysis.cpp
A llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll
Log Message:
-----------
[DependenceAnalysis] Extending SIV to handle fusable loops (#128782)
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
Commit: 4bc9d29fbaa7f0d0a3b522e1e085e228d5d40d76
https://github.com/llvm/llvm-project/commit/4bc9d29fbaa7f0d0a3b522e1e085e228d5d40d76
Author: Chengjun <chengjunp at Nvidia.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/lib/Transforms/Scalar/SROA.cpp
A llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
A llvm/test/Transforms/SROA/vector-promotion-via-tree-structure-merge.ll
Log Message:
-----------
[SROA] Use tree-structure merge to remove alloca (#152793)
This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.
The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:
- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain
This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.
This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.
Key features:
- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization
Example transformation:
```
// Before: (stores do not have to be in order)
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca ; offset 0-1
store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
%result = load <8 x float>, ptr %alloca
// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
Benefits:
- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns
For some large cases, the compile time can be reduced from about 60s to
less than 3s.
---------
Co-authored-by: chengjunp <chengjunp at nividia.com>
Commit: f437309e28daaf7443cb200d789a9304aac250d1
https://github.com/llvm/llvm-project/commit/f437309e28daaf7443cb200d789a9304aac250d1
Author: Craig Topper <craig.topper at sifive.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
Log Message:
-----------
[RISCV] Update comments in RISCVMatInt to reflect we don't always use ADDIW after LUI now. NFC (#159829)
The simm32 base case only uses lui+addiw when necessary after
3d2650bdeb8409563d917d8eef70b906323524ef
The worst case 8 instruction sequence doesn't leave a full 32 bits for
the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So
we will never generate LUI+ADDIW in the worst case sequence.
Commit: c91fa95fc7426215817c3d20564558a310d587e2
https://github.com/llvm/llvm-project/commit/c91fa95fc7426215817c3d20564558a310d587e2
Author: Aiden Grossman <aidengrossman at google.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/lib/Transforms/IPO/SampleProfile.cpp
Log Message:
-----------
[SampleProfile] Always use FAM to get ORE
The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.
Reviewers: aeubanks, efriedma-quic, wlei-llvm
Reviewed By: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/159858
Commit: 4a9fdda9882da8f054009c249c4bb0ef18f6f21a
https://github.com/llvm/llvm-project/commit/4a9fdda9882da8f054009c249c4bb0ef18f6f21a
Author: Roman Belenov <rbelenov at gmail.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/docs/CommandGuide/llvm-mca.rst
M llvm/include/llvm/MCA/CustomBehaviour.h
M llvm/include/llvm/MCA/InstrBuilder.h
M llvm/lib/MCA/CustomBehaviour.cpp
M llvm/lib/MCA/InstrBuilder.cpp
A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
M llvm/tools/llvm-mca/llvm-mca.cpp
M llvm/unittests/tools/llvm-mca/MCATestBase.cpp
M llvm/unittests/tools/llvm-mca/MCATestBase.h
M llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
Log Message:
-----------
[MCA] Enable customization of individual instructions (#155420)
Currently MCA takes instruction properties from scheduling model.
However, some instructions may execute differently depending on external
factors - for example, latency of memory instructions may vary
differently depending on whether the load comes from L1 cache, L2 or
DRAM. While MCA as a static analysis tool cannot model such differences
(and currently takes some static decision, e.g. all memory ops are
treated as L1 accesses), it makes sense to allow manual modification of
instruction properties to model different behavior (e.g. sensitivity of
code performance to cache misses in particular load instruction). This
patch addresses this need.
The library modification is intentionally generic - arbitrary
modifications to InstrDesc are allowed. The tool support is currently
limited to changing instruction latencies (single number applies to all
output arguments and MaxLatency) via coments in the input assembler
code; the format is the like this:
add (%eax), eax // LLVM-MCA-LATENCY:100
Users of MCA library can already make additional customizations; command
line tool can be extended in the future.
Note that InstructionView currently shows per-instruction information
according to scheduling model and is not affected by this change.
See https://github.com/llvm/llvm-project/issues/133429 for additional
clarifications (including explanation why existing customization
mechanisms do not provide required functionality)
---------
Co-authored-by: Min-Yih Hsu <min at myhsu.dev>
Commit: 9542d0a0c661be92db950514b5dc9c5ea6d953af
https://github.com/llvm/llvm-project/commit/9542d0a0c661be92db950514b5dc9c5ea6d953af
Author: Joseph Huber <huberjn at outlook.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M llvm/cmake/modules/HandleLLVMOptions.cmake
Log Message:
-----------
[libc] Fix libc build on NVPTX using wrong linker flag
Summary:
Ugly hacks abound, we can't actually test linker flags correctly
generically because not everyone has `nvlink` as a binary on their
machine which would then result in every single flag being unsupported.
This is the only 'linker flag' check we have, so just hard code it off.
Commit: a38794ff3d47588cb226881eb048cb2333962ab9
https://github.com/llvm/llvm-project/commit/a38794ff3d47588cb226881eb048cb2333962ab9
Author: Aiden Grossman <aidengrossman at google.com>
Date: 2025-09-19 (Fri, 19 Sep 2025)
Changed paths:
M clang/test/ClangScanDeps/pr61006.cppm
M clang/test/ClangScanDeps/resource_directory.c
M clang/test/Driver/env.c
M clang/test/Driver/program-path-priority.c
M clang/test/Modules/relative-resource-dir.m
Log Message:
-----------
[Clang] Rewrite tests using subshells to set env variables
Now that we have the %readfile substitution, we can rewrite these tests
that were using env variable subshells to write the output of the
command into a file and then load it where it is needed using readfile.
This does involve one invocation of bash so that we are using the system
env binary, which does support redirection into a tool like grep. We
already do this in one LLVM test. I'm not happy about needing that, but
the more principled way to solve it involves reworking how in-process
builtins work within lit. That is something we want to do eventually,
but not something that I think should block this patch.
Reviewers: cmtice, petrhosek, ilovepi
Reviewed By: cmtice, ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/158446
Commit: 9c246a9c12af9e21ca912ae94ec5816cb684a924
https://github.com/llvm/llvm-project/commit/9c246a9c12af9e21ca912ae94ec5816cb684a924
Author: Aiden Grossman <aidengrossman at google.com>
Date: 2025-09-20 (Sat, 20 Sep 2025)
Changed paths:
M llvm/cmake/modules/HandleLLVMOptions.cmake
M llvm/docs/CommandGuide/llvm-mca.rst
M llvm/include/llvm/Analysis/DependenceAnalysis.h
M llvm/include/llvm/MCA/CustomBehaviour.h
M llvm/include/llvm/MCA/InstrBuilder.h
M llvm/lib/Analysis/DependenceAnalysis.cpp
M llvm/lib/MCA/CustomBehaviour.cpp
M llvm/lib/MCA/InstrBuilder.cpp
M llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
M llvm/lib/Transforms/IPO/SampleProfile.cpp
M llvm/lib/Transforms/Scalar/SROA.cpp
A llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll
A llvm/test/Transforms/SROA/vector-promotion-cannot-tree-structure-merge.ll
A llvm/test/Transforms/SROA/vector-promotion-via-tree-structure-merge.ll
A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
A llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
M llvm/tools/llvm-mca/llvm-mca.cpp
M llvm/unittests/tools/llvm-mca/MCATestBase.cpp
M llvm/unittests/tools/llvm-mca/MCATestBase.h
M llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
Log Message:
-----------
[𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6
[skip ci]
Compare: https://github.com/llvm/llvm-project/compare/93f6691db215...9c246a9c12af
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list