[all-commits] [llvm/llvm-project] d5488f: [AArch64] Combine separate vector and scalar table...

Tue Feb 4 08:30:57 PST 2025

  Branch: refs/heads/users/alexey-bataev/spr/slpgather-scalarized-calls
  Home:   https://github.com/llvm/llvm-project
  Commit: d5488f157c74332646d2b6e9d16c88e61d5a789e
      https://github.com/llvm/llvm-project/commit/d5488f157c74332646d2b6e9d16c88e61d5a789e
  Author: Craig Topper <craig.topper at sifive.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64InstrInfo.td

  Log Message:
  -----------
  [AArch64] Combine separate vector and scalar tablegen SDNode record for AArch64ISD::REV16. NFC (#125614)

Relax the SDTypeProfile for AArch64ISD::REV32/REV64 to remove the
requirement that the type be vector.

It's not a good idea to have two different SDNode records with different
SDTypeProfiles. SDTypeProfiles are used to remove some unneeded checks
from the GenDAGISel.inc. Having different SDTypeProfiles can cause
checks to be removed that can create ambiguous matches, but that did not
happen in this case.

With this change the AArchGenDAGISel.inc is identical. The only change
is AArch64GenGlobalISel.inc which now includes scalar patterns for
G_REV16 due to them now being picks up by an SDNodeEquiv. GISel does not
yet use G_REV16 for scalars so this is not a functional change.

  Commit: f7aad60cd1a538fb1eb5ab861f8c29ddba5283a4
      https://github.com/llvm/llvm-project/commit/f7aad60cd1a538fb1eb5ab861f8c29ddba5283a4
  Author: Piotr Fusik <p.fusik at samsung.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsll.ll
    M llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vpscatter-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vwsll-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vwsll-vp.ll

  Log Message:
  -----------
  [RISCV] Fold vector shift of sext/zext to widening multiply (#121563)

    (shl (sext X), C) -> (vwmulsu X, 1u << C)
    (shl (zext X), C) -> (vwmulu  X, 1u << C)

  Commit: 389d1359f330c55098d75f00efe03749943d98e7
      https://github.com/llvm/llvm-project/commit/389d1359f330c55098d75f00efe03749943d98e7
  Author: Jerry-Ge <jerry.ge at arm.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td

  Log Message:
  -----------
  [TOSA] fix TileOp description (#125707)

Simple textual fix to match TOSA v1.0 specification:
https://www.mlplatform.org/tosa/tosa_spec.html#_tile

Signed-off-by: Arteen Abrishami <arteen.abrishami at arm.com>
Co-authored-by: Arteen Abrishami <arteen.abrishami at arm.com>

  Commit: bd30838422bc31c90ae6e7119c433159d351ff05
      https://github.com/llvm/llvm-project/commit/bd30838422bc31c90ae6e7119c433159d351ff05
  Author: Razvan Lupusoru <razvan.lupusoru at gmail.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M flang/include/flang/Lower/DirectivesCommon.h
    M flang/include/flang/Optimizer/Builder/DirectivesCommon.h
    M flang/lib/Lower/OpenACC.cpp
    M flang/test/Lower/OpenACC/acc-bounds.f90
    A flang/test/Lower/OpenACC/acc-data-operands-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-data-operands.f90
    A flang/test/Lower/OpenACC/acc-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-data.f90
    A flang/test/Lower/OpenACC/acc-declare-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-declare.f90
    A flang/test/Lower/OpenACC/acc-enter-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-enter-data.f90
    A flang/test/Lower/OpenACC/acc-exit-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-exit-data.f90
    A flang/test/Lower/OpenACC/acc-host-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-host-data.f90
    M flang/test/Lower/OpenACC/acc-kernels-loop.f90
    M flang/test/Lower/OpenACC/acc-kernels.f90
    M flang/test/Lower/OpenACC/acc-loop.f90
    M flang/test/Lower/OpenACC/acc-parallel-loop.f90
    M flang/test/Lower/OpenACC/acc-parallel.f90
    A flang/test/Lower/OpenACC/acc-private-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-private.f90
    A flang/test/Lower/OpenACC/acc-reduction-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-reduction.f90
    M flang/test/Lower/OpenACC/acc-serial-loop.f90
    M flang/test/Lower/OpenACC/acc-serial.f90
    M flang/test/Lower/OpenACC/acc-update.f90
    M mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
    M mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp

  Log Message:
  -----------
  [flang][acc] Improve acc lowering around fir.box and arrays (#125600)

The current implementation of OpenACC lowering includes explicit
expansion of following cases:
- Creation of `acc.bounds` operations for all arrays, including those
whose dimensions are captured in the type (eg `!fir.array<100xf32>`)
- Expansion of box types by only putting the box's address in the data
clause. The address was extracted with a `fir.box_addr` operation and
the bounds were filled with `fir.box_dims` operation.

However, with the creation of the new type interface `MappableType`, the
idea is that specific type-based semantics can now be used. This also
really simplifies representation in the IR. Consider the following
example:
```
subroutine sub(arr)
  real :: arr(:)
  !$acc enter data copyin(arr)
end subroutine
```

Before the current PR, the relevant acc dialect IR looked like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
  ...
  %1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?xf32>>, index)
-> (index, index, index)
  %c0_0 = arith.constant 0 : index
  %3 = arith.subi %2#1, %c1 : index
  %4 = acc.bounds lowerbound(%c0_0 : index) upperbound(%3 : index)
extent(%2#1 : index) stride(%2#2 : index) startIdx(%c1 : index)
{strideInBytes = true}
  %5 = fir.box_addr %1#0 : (!fir.box<!fir.array<?xf32>>) ->
!fir.ref<!fir.array<?xf32>>
  %6 = acc.copyin varPtr(%5 : !fir.ref<!fir.array<?xf32>>) bounds(%4) ->
!fir.ref<!fir.array<?xf32>> {name = "arr", structured = false}
  acc.enter_data dataOperands(%6 : !fir.ref<!fir.array<?xf32>>)
```

After the current change, it looks like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
  ...
  %1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
  %2 = acc.copyin var(%1#0 : !fir.box<!fir.array<?xf32>>) ->
!fir.box<!fir.array<?xf32>> {name = "arr", structured = false}
  acc.enter_data dataOperands(%2 : !fir.box<!fir.array<?xf32>>)
```

Restoring the old behavior can be done with following command line
options:
`--openacc-unwrap-fir-box=true --openacc-generate-default-bounds=true`

  Commit: d8148244e9be9d4c7b12abbdbf275d80d5ba57a5
      https://github.com/llvm/llvm-project/commit/d8148244e9be9d4c7b12abbdbf275d80d5ba57a5
  Author: Nikolas Klauser <nikolasklauser at berlin.de>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M libcxx/include/__string/constexpr_c_functions.h

  Log Message:
  -----------
  [libc++] Decrease instantiation cost of __constexpr_memmove (#125109)

Using `if constexpr` in `__constexpr_memmove` makes the instantiation
three times faster for the same type, since it avoids a bunch of class
instantiations and SFINAE for constexpr support that's never actually
used. Given that `__constexpr_memmove` is used quite a bit through
`std::copy` and is instantiated multiple times when just including
`<__string/char_traits.h>` this can provide a nice compile time speedup
for a very simple change.

  Commit: 6515fdf73de724d21b6c807ad75f2139c1d7af32
      https://github.com/llvm/llvm-project/commit/6515fdf73de724d21b6c807ad75f2139c1d7af32
  Author: Brox Chen <guochen2 at amd.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/test/CodeGen/AMDGPU/minimummaximum.ll
    M llvm/test/CodeGen/AMDGPU/minmax.ll

  Log Message:
  -----------
  [AMDGPU][True16][CodeGen] true16 codegen for FPMinMax pat (#125107)

true16 codegen for FPMinMax Pattern

  Commit: 5eff19f48b6493d52eeab74d9a81867d49f61bbb
      https://github.com/llvm/llvm-project/commit/5eff19f48b6493d52eeab74d9a81867d49f61bbb
  Author: Brox Chen <guochen2 at amd.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    A llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.gfx11plus-fake16.mir
    A llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.gfx11plus.mir
    M llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir

  Log Message:
  -----------
  [AMDGPU][True16][Codegen] true16 codegen for FPtoI1 (#125120)

True16 codegen for FPtoi1.

It seems tablegen figured out the pattern even without this pat in
place, and the fptoui/fptosi.ll already got the right transformation.
Aditionally updated the mir file and split it to pre-gfx11 and
post-gfx11.

  Commit: bae97e1976e44066dfad5d84fb921165e6588e2d
      https://github.com/llvm/llvm-project/commit/bae97e1976e44066dfad5d84fb921165e6588e2d
  Author: David Spickett <david.spickett at linaro.org>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M clang/include/clang/AST/DeclTemplate.h
    M clang/include/clang/Sema/Sema.h
    M clang/lib/AST/ASTImporter.cpp
    M clang/lib/AST/DeclTemplate.cpp
    M clang/lib/AST/JSONNodeDumper.cpp
    M clang/lib/AST/TextNodeDumper.cpp
    M clang/lib/Sema/SemaTemplate.cpp
    M clang/lib/Sema/SemaTemplateDeduction.cpp
    M clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
    M clang/lib/Sema/SemaType.cpp
    M clang/lib/Serialization/ASTReaderDecl.cpp
    M clang/lib/Serialization/ASTWriterDecl.cpp
    M clang/test/AST/ast-dump-templates.cpp
    M clang/test/AST/gen_ast_dump_json_test.py
    M clang/test/SemaTemplate/cwg2398.cpp

  Log Message:
  -----------
  Revert "[clang] fix P3310 overload resolution flag propagation" (#125710)

Reverts llvm/llvm-project#125372 due to lldb builds failing:
https://lab.llvm.org/buildbot/#/builders/59/builds/12223

We need to decide how to update LLDB's code.

  Commit: 5ca136d0e723029e6bef894961701b6ca1b6cd29
      https://github.com/llvm/llvm-project/commit/5ca136d0e723029e6bef894961701b6ca1b6cd29
  Author: Alexey Bataev <a.bataev at outlook.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll

  Log Message:
  -----------
  [SLP][NFC]Replace undefs with just poison in the test

  Commit: 25daf7bb3934e80b395b3ced53e812d314cb1c86
      https://github.com/llvm/llvm-project/commit/25daf7bb3934e80b395b3ced53e812d314cb1c86
  Author: Kerry McLaughlin <kerry.mclaughlin at arm.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
    M llvm/lib/Target/AArch64/SMEInstrFormats.td
    M llvm/lib/Target/AArch64/SMEPeepholeOpt.cpp
    A llvm/test/CodeGen/AArch64/fp8-sme2-cvtn.ll
    A llvm/test/CodeGen/AArch64/luti-with-sme2.ll
    A llvm/test/CodeGen/AArch64/perm-tb-with-sme2.ll
    M llvm/test/CodeGen/AArch64/sme2-fp8-intrinsics-cvt.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-add.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-fp-dots.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-insert-mova.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-int-dots.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-qcvt.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-qrshr.ll

  Log Message:
  -----------
  [AArch64][SME] Extend FORM_TRANSPOSED pseudos to all multi-vector intrinsics (#124258)

All patterns for multi-vector intrinsics should try to use the FORM_TRANSPOSED
pseudos so that they can benefit from register allocation hints when SME is available.

This patch removes the post-isel hook for the pseudo and instead extends the
SMEPeepholeOpt pass to replace a REG_SEQENCE with the pseudo if the
expected pattern of StridedOrContiguous copies is found. With this change,
the tablegen patterns for the intrinsics can remain unchanged.

One test has been added for each multiclass this affects.

  Commit: 0358b705d40fd17e960a21476ab74531a3212f39
      https://github.com/llvm/llvm-project/commit/0358b705d40fd17e960a21476ab74531a3212f39
  Author: Alexey Bataev <a.bataev at outlook.com>
  Date:   2025-02-04 (Tue, 04 Feb 2025)

  Changed paths:
    M clang/include/clang/AST/DeclTemplate.h
    M clang/include/clang/Sema/Sema.h
    M clang/lib/AST/ASTImporter.cpp
    M clang/lib/AST/DeclTemplate.cpp
    M clang/lib/AST/JSONNodeDumper.cpp
    M clang/lib/AST/TextNodeDumper.cpp
    M clang/lib/Sema/SemaTemplate.cpp
    M clang/lib/Sema/SemaTemplateDeduction.cpp
    M clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
    M clang/lib/Sema/SemaType.cpp
    M clang/lib/Serialization/ASTReaderDecl.cpp
    M clang/lib/Serialization/ASTWriterDecl.cpp
    M clang/test/AST/ast-dump-templates.cpp
    M clang/test/AST/gen_ast_dump_json_test.py
    M clang/test/SemaTemplate/cwg2398.cpp
    M flang/include/flang/Lower/DirectivesCommon.h
    M flang/include/flang/Optimizer/Builder/DirectivesCommon.h
    M flang/lib/Lower/OpenACC.cpp
    M flang/test/Lower/OpenACC/acc-bounds.f90
    A flang/test/Lower/OpenACC/acc-data-operands-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-data-operands.f90
    A flang/test/Lower/OpenACC/acc-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-data.f90
    A flang/test/Lower/OpenACC/acc-declare-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-declare.f90
    A flang/test/Lower/OpenACC/acc-enter-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-enter-data.f90
    A flang/test/Lower/OpenACC/acc-exit-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-exit-data.f90
    A flang/test/Lower/OpenACC/acc-host-data-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-host-data.f90
    M flang/test/Lower/OpenACC/acc-kernels-loop.f90
    M flang/test/Lower/OpenACC/acc-kernels.f90
    M flang/test/Lower/OpenACC/acc-loop.f90
    M flang/test/Lower/OpenACC/acc-parallel-loop.f90
    M flang/test/Lower/OpenACC/acc-parallel.f90
    A flang/test/Lower/OpenACC/acc-private-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-private.f90
    A flang/test/Lower/OpenACC/acc-reduction-unwrap-defaultbounds.f90
    M flang/test/Lower/OpenACC/acc-reduction.f90
    M flang/test/Lower/OpenACC/acc-serial-loop.f90
    M flang/test/Lower/OpenACC/acc-serial.f90
    M flang/test/Lower/OpenACC/acc-update.f90
    M libcxx/include/__string/constexpr_c_functions.h
    M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
    M llvm/lib/Target/AArch64/AArch64InstrInfo.td
    M llvm/lib/Target/AArch64/SMEInstrFormats.td
    M llvm/lib/Target/AArch64/SMEPeepholeOpt.cpp
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    A llvm/test/CodeGen/AArch64/fp8-sme2-cvtn.ll
    A llvm/test/CodeGen/AArch64/luti-with-sme2.ll
    A llvm/test/CodeGen/AArch64/perm-tb-with-sme2.ll
    M llvm/test/CodeGen/AArch64/sme2-fp8-intrinsics-cvt.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-add.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-fp-dots.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-insert-mova.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-int-dots.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-qcvt.ll
    M llvm/test/CodeGen/AArch64/sme2-intrinsics-qrshr.ll
    A llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.gfx11plus-fake16.mir
    A llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.gfx11plus.mir
    M llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir
    M llvm/test/CodeGen/AMDGPU/minimummaximum.ll
    M llvm/test/CodeGen/AMDGPU/minmax.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsll.ll
    M llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vpscatter-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vwsll-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vwsll-vp.ll
    M llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll
    M mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
    M mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td
    M mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp

  Log Message:
  -----------
  Rebase

Created using spr 1.3.5

Compare: https://github.com/llvm/llvm-project/compare/1eddb79f6708...0358b705d40f

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications